howisonlab / software-mentions-dataset-analysis

Analyses of software mentions and dependencies
GNU General Public License v3.0
4 stars 0 forks source link

Check for multiple non-defualt parses #10

Open willbeason opened 1 month ago

willbeason commented 1 month ago

Per #9 we're including parses of JATS/GROBID/others. Each is only produced for a small set of papers (~1%), but there may be overlap. In the case of overlap, we'll need to decide whether to keep them, or whether one "wins" and the others are discarded.

willbeason commented 1 month ago

Confirmed that these exist:

./f6/73/a4/00f673a4-d339-479d-9760-3c4a6f9decec/00f673a4-d339-479d-9760-3c4a6f9decec.grobid.tei.software.json
./f6/73/a4/00f673a4-d339-479d-9760-3c4a6f9decec/00f673a4-d339-479d-9760-3c4a6f9decec.pub2tei.tei.software.json

Now we'll need to decide what to do with these.