reisepass / ETHz_HeadlineGenerator

NLP 2013 INF.ETHz
2 stars 0 forks source link

index out of bounds error in DocNGramSimple on line 59 ngramWords[i] = words[i]; See below #19

Open reisepass opened 11 years ago

reisepass commented 11 years ago

This happens when ever getDocsInCluster(cluster); returns an empty array in DocCluster line 55;

List clusterDocs = getDocsInCluster(cluster);

reisepass commented 11 years ago

So when there are no docs assigned to a cluster i guess

jarednieder commented 11 years ago

That might be it, but I haven't checked the means code since I made the last set of changes, so I'll need to look into it. On May 18, 2013 8:57 PM, "Mortiffer" notifications@github.com wrote:

So when there are no docs assigned to a cluster i guess

— Reply to this email directly or view it on GitHubhttps://github.com/rubenwolff/ETHz_HeadlineGenerator/issues/19#issuecomment-18106196 .

reisepass commented 11 years ago

Since the excetion is comming form the function private WordCountTree getCounts(String[] words) { when it recieves an empty array; Why dont we just have a condition at the top :

private WordCountTree getCounts(String[] words) {   
    WordCountTree tree = new WordCountTree();
    if(words==null||words.length<n)
        return tree; 

I put it at <n because the error was actually produced with words.length==1 and words[0]==""

jarednieder commented 11 years ago

That sounds good. I don't think there's much error checking going on so we may need to add several checks like this to other functions. On May 18, 2013 9:09 PM, "Mortiffer" notifications@github.com wrote:

Since the excetion is comming form the function private WordCountTree getCounts(String[] words) { when it recieves an empty array; Why dont we just have a condition at the top :

private WordCountTree getCounts(String[] words) { WordCountTree tree = new WordCountTree(); if(words==null||words.length<n) return tree;

I put it at <n because the error was actually produced with words.length==1 and words[0]==""

— Reply to this email directly or view it on GitHubhttps://github.com/rubenwolff/ETHz_HeadlineGenerator/issues/19#issuecomment-18106355 .

reisepass commented 11 years ago

After doing that I get a null pointer from this call in main :

        probs[i] = new NoFilterAddTestCorpus(trainCluster.getClusterNgramProbs(clusterAssign.get(i)));

because of the earlier change trainCluster.getClusterNgramProbs(clusterAssign.get(i))) returns an empty TreeMap now

Null pointer happens here public NgramSimple(TreeMap<ArrayList, Double> inNgrams) { ngramFreq = inNgrams; n = inNgrams.firstEntry().getKey().size(); }

changing this too::

public class NgramSimple implements NGramProbs { protected TreeMap<ArrayList, Double> ngramFreq; protected int n = 2;

public NgramSimple(TreeMap<ArrayList<String>, Double> inNgrams) {
    ngramFreq = inNgrams;
    if(inNgrams.firstEntry()!=null)
        n = inNgrams.firstEntry().getKey().size();
}

SO with an empty tree it defaults to 2

reisepass commented 11 years ago

So t hat lets me finish the main method without errors using just a few of the DUC2004 docs as input. I get cant run rouge but the summary turns out to be :

Cambodian leader Hun Sen Friday rejected opposition parties demands talks country King Norodom Sihanouk has declined requests chair summit Cambodia top political leaders Cambodia two-party opposition asked Asian Development Bank Monday providing loans incumbent government

jarednieder commented 11 years ago

That actually sounds pretty legit On May 18, 2013 9:29 PM, "Mortiffer" notifications@github.com wrote:

So t hat lets me finish the main method without errors using just a few of the DUC2004 docs as input. I get cant run rouge but the summary turns out to be :

Cambodian leader Hun Sen Friday rejected opposition parties demands talks country King Norodom Sihanouk has declined requests chair summit Cambodia top political leaders Cambodia two-party opposition asked Asian Development Bank Monday providing loans incumbent government

— Reply to this email directly or view it on GitHubhttps://github.com/rubenwolff/ETHz_HeadlineGenerator/issues/19#issuecomment-18106619 .

reisepass commented 11 years ago

Actually i think those are just the first sentences. With some stuff taken out