greenelab / pubtator

Retrieve and process PubTator annotations
Other
43 stars 9 forks source link

Add lfs tracking and bug fix #17

Closed danich1 closed 7 years ago

danich1 commented 7 years ago

This PR fixes a bug where pandas was writing the headers of each data chunk to file. As a result database insertion was wonky. Also added git lfs tracking for the pubtator tags as mentioned in #4. @dhimmel let me know what you think.

dhimmel commented 7 years ago

Running this command for pubtator-hetnet-tags.tsv.xz

xzcat pubtator-hetnet-tags.tsv.xz | wc --lines

Returns 73079035. 73 million tags. Not bad.