rnajena / viralclust

Small pipeline to cluster viral genomes based on their k-mer content. WiP
GNU General Public License v3.0
15 stars 4 forks source link

NCBI Accession IDs enhancement #4

Closed klamkiew closed 2 years ago

klamkiew commented 3 years ago

If several NCBI IDs are found in the header, store all and check later, whether one of them exists in the ncbi_metainfo.pkl file.

klamkiew commented 2 years ago

Potential downstream bug: If there are none or more than one potential IDs found with the regex, the name of the sequence is unchanged. However, long names seem to cause visualization issues in nw_display, which breaks the .svg and .pdf files.