wikimedia / drafttopic

Predicting topics to new drafts based on Wikiprojects on English Wikipedia
https://drafttopic.readthedocs.io
MIT License
10 stars 7 forks source link

Tweak fetch_text to get initial revisions #23

Closed adamwight closed 6 years ago

adamwight commented 6 years ago

Retrieves the articles' initial revision rather than the latest. Doesn't follow initial revisions which contain a redirect, skips these instead.

I was able to run this and verify the output, at ores-staging-01.eqiad.wmflabs:/srv/home/awight/drafttopic

wc -l
     93455 datasets/enwiki.labeled_wikiprojects.json
     84481 datasets/enwiki.labeled_wikiprojects.w_text.json

Bug: T193834

codecov[bot] commented 6 years ago

Codecov Report

Merging #23 into master will decrease coverage by 6.85%. The diff coverage is 3.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #23      +/-   ##
==========================================
- Coverage   53.37%   46.52%   -6.86%     
==========================================
  Files          17       16       -1     
  Lines         755      733      -22     
==========================================
- Hits          403      341      -62     
- Misses        352      392      +40
Impacted Files Coverage Δ
drafttopic/feature_lists/wordvectors.py 0% <ø> (ø) :arrow_up:
drafttopic/utilities/fetch_text.py 0% <0%> (-59.75%) :arrow_down:
drafttopic/utilities/wikiprojects_common.py 57.14% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 768eb42...049430c. Read the comment docs.

halfak commented 6 years ago

Just a couple of notes. Otherwise looks good to me.