lil-lab / newsroom

Tools for downloading and analyzing summaries and evaluating summarization systems. https://summari.es/
Other
147 stars 24 forks source link

data extract doesnt work #8

Closed anhduc114 closed 4 years ago

anhduc114 commented 6 years ago

when i run this command newsroom-extract --archive dev.archive --dataset dev.data to obtain the data file, the command line shows these message: "Loading downloaded summaries: Aborted!" Then it stops. Can you guys help fix this? Ty

grusky commented 6 years ago

Sure, thanks for telling me about this! Does the same thing happen when you run the test URLs included in the repo in the /tests directory? This is a subset of URLs that I know should work.

newsroom-scrape --urls tests/urls.txt --archive tests.archive
newsroom-extract --archive tests.archive --dataset tests.data

I've also heard other people having trouble using the repo with Python 3.5 or lower. What version of Python and the repository dependencies are you using (pip freeze)?

beautifulsoup4
click
nltk
readability-lxml
requests
tqdm
numpy
ujson
spacy

I'll see if I can replicate what you're seeing on my machine.

anhduc114 commented 6 years ago

it seems that the issue appears when i try to extract the dev.archive file without having it downloading all the data. I downloaded the full dev.archive and then, extraction commands went well.