nextstrain / dengue

Nextstrain build for dengue virus
https://nextstrain.org/dengue
8 stars 10 forks source link

[wip] Do Not Merge - Pull correct reference titles #8

Closed j23414 closed 9 months ago

j23414 commented 1 year ago

Description of proposed changes

The issue this PR is trying to fix is that ingest/bin/genbank-url line 71 is fetching the GenBank Definition Line instead of the references title. Some more context in this slack thread.

https://github.com/nextstrain/dengue/blob/e74fc6b8e43bd7efc7af7cb079f83809f90d2f88/ingest/bin/genbank-url#L71

An example GenBank Reference title is at:

https://github.com/nextstrain/dengue/blob/0591cf83fe81b9f75225e492651186d75fd09694/config/reference_dengue_all.gb#L15

To fetch the correct titles, this commit (099600cc4e9a2c2f5f34210b9a373c88c3e940f9):

However, performing this looped fetch across ~11K genbank entries is time consuming. So this PR is separated from the new_ingest PR in case there are speedups that I'm not seeing. Open to suggestions.

Related issue(s)

Testing

j23414 commented 9 months ago

Closing, since we seem to be moving away from using the "titles" and "journal" fields.