mischkew / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPages(PageQuery) should allow to get number of pages #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPages(PageQuery) returned an
unmodifiable collection instead of an iterable, it would be possible to get the 
number of pages using size(). That would be helpful e.g. to display progress 
information (x of y pages processed). Inerhiting from AbstractCollection may be 
helpful.

Original issue reported on code.google.com by torsten....@gmail.com on 21 Sep 2010 at 4:10

GoogleCodeExporter commented 9 years ago
> If de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPages(PageQuery) returned an
> unmodifiable collection instead of an iterable, it would be possible to get 
the
> number of pages using size(). That would be helpful e.g. to display progress
> information (x of y pages processed). Inerhiting from AbstractCollection may 
be
> helpful.

getPages(PageQuery) is _very_ slow. In the current form, it was never intended
for productive usage.
Are you going to use that?
If not, I am not sure, whether it's worth the effort.

Original comment by torsten....@gmail.com on 21 Sep 2010 at 4:10

GoogleCodeExporter commented 9 years ago
The ExtendedWikipediaReader is using that to iterate over the pages. Since this
process takes quite long, it is conventient to know what the total number of
pages will be, so a progress meter can show in percent how much has already
been processed and how long it will approximately still take to complete. The
progress currently displayed there used the number of pages from the wiki
metadata, which does not seem to be the correct place to look for. The metadata
told me there were like 900k but the query just returned about 500k.

I can change it in the Wikipedia API if there is no objection.

Original comment by torsten....@gmail.com on 21 Sep 2010 at 4:11

GoogleCodeExporter commented 9 years ago
Is this bug still open?

Original comment by oliver.ferschke on 1 Jun 2011 at 6:06

GoogleCodeExporter commented 9 years ago
I think I was the person who originally requested that feature. Well, I suppose 
if getPages() still returns an iterable, the issue is still open.

Original comment by richard.eckart on 2 Jun 2011 at 9:29