Closed danich1 closed 7 years ago
This PR fixes a bug where pandas was writing the headers of each data chunk to file. As a result database insertion was wonky. Also added git lfs tracking for the pubtator tags as mentioned in #4. @dhimmel let me know what you think.
Running this command for pubtator-hetnet-tags.tsv.xz
pubtator-hetnet-tags.tsv.xz
xzcat pubtator-hetnet-tags.tsv.xz | wc --lines
Returns 73079035. 73 million tags. Not bad.
73079035
This PR fixes a bug where pandas was writing the headers of each data chunk to file. As a result database insertion was wonky. Also added git lfs tracking for the pubtator tags as mentioned in #4. @dhimmel let me know what you think.