papermachines / papermachines

[UNMAINTAINED] A Zotero extension for analysis and visualization in the digital humanities.
BSD 2-Clause "Simplified" License
216 stars 30 forks source link

Phrase net is full of stop words #31

Open brekhusr opened 11 years ago

brekhusr commented 11 years ago

Phrase-Net-x-a-y Phrase-Net-x-y jpg

These two phrase nets did not tell me very much about my texts...is there a way to avoid this kind of result when working with PDFs with a lot of embedded text/metadata?

corajr commented 11 years ago

By adding your own stop words (1 per line) to the file "stopwords.txt" in the Paper Machines data folder, you should be able to get a clearer picture of your data. I will shortly add the ability to add stop words through a comma-separated list in the preferences.

brekhusr commented 11 years ago

When I open the text files (stopwords, stopwords_en, stopwords_pt, search_stopwords) that come up when I search my computer for files called stopwords.txt and select results from the Paper Machines data folder, I don't see "lines" that would allow me to add 1 stopword per line. I just see a sort of unbroken stream of stopwords that don't even have spaces between them. stopwords. Should I just go to the end and start typing additional stopwords? If so, how will it know where I mean to delimit them? Thanks, and sorry to be ignorant!

corajr commented 11 years ago

Ah, the line endings are in Unix format rather than Windows, so it shows up for you without line breaks. I've already implemented a preference pane that will allow additional entries, one per line, so you won't have to navigate to the file or anything. That will be released probably tonight, or as soon as I figure out a bug with geodict (it's about 90% there).

brekhusr commented 11 years ago

Terrific! Meanwhile, I'll try writing to the Austrian National Library, which maintains http://europeana-geo.isti.cnr.it/geoparser, in German, and ask them if/when they're planning to bring that back online.

mkane2 commented 9 years ago

Has this been resolved? I get the pane to add one stopword per line, but they don't seem to be used after multiple restarts of zotero and firefox (tried it in both versions) and restarts of the computer. The list of added stopwords persists across restarts.