grp-bork / spire_contribute

3 stars 0 forks source link

missing metadata for a small minority #3

Closed wwood closed 9 months ago

wwood commented 10 months ago

Hi,

It seems 70 genomes are missing metadata e.g.

$ grep spire_v1_095_000273129 spire_v1_cluster_metadata.tsv
spire_v1_095_000273129  7       0       7       NA      NA      NA      NA      NA      NA      NA

$ grep spire_mag_01907768 spire_v1_genome_metadata.tsv
spire_mag_01907768      spire_v1_095_000273129  95_ANI  NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      208     kingdom 0       0       0.08 1TRUE    Unclassified    NA      NA      NA      NA      NA      NA      NA      NA

Easy enough to rerun these through the tools myself since I have the genomes, but for other people might be useful to fix. Thanks.

spire_mag_01907768
spire_mag_01905358
spire_mag_02371632
spire_mag_02050351
spire_mag_02050022
spire_mag_02399131
spire_mag_02438678
spire_mag_02223008
spire_mag_02219863
spire_mag_02374340
spire_mag_02371291
spire_mag_02371371
spire_mag_02977139
spire_mag_01598896
spire_mag_01598897
spire_mag_01598898
spire_mag_01598899
spire_mag_01598902
spire_mag_01598908
spire_mag_01598911
spire_mag_01598912
spire_mag_01598913
spire_mag_01869855
spire_mag_01905356
spire_mag_01905357
spire_mag_02040160
spire_mag_02040435
spire_mag_02050005
spire_mag_02050048
spire_mag_02050124
spire_mag_02050134
spire_mag_02050170
spire_mag_02050232
spire_mag_02050253
spire_mag_02050369
spire_mag_02073124
spire_mag_02073251
spire_mag_02073285
spire_mag_02073310
spire_mag_02165471
spire_mag_02219849
spire_mag_02219913
spire_mag_02220384
spire_mag_02220435
spire_mag_02221509
spire_mag_02221987
spire_mag_02222100
spire_mag_02222554
spire_mag_02222622
spire_mag_02222881
spire_mag_02223597
spire_mag_02224155
spire_mag_02234385
spire_mag_02275738
spire_mag_02275773
spire_mag_02292464
spire_mag_02292963
spire_mag_02296742
spire_mag_02296792
spire_mag_02297347
spire_mag_02298387
spire_mag_02299463
spire_mag_02370239
spire_mag_02372667
spire_mag_02372797
spire_mag_02374249
spire_mag_02402555
spire_mag_02600909
spire_mag_02659344
spire_mag_02976516
wwood commented 10 months ago

Running these genomes through CheckM2, they all seem to be low quality (<50%).

fullama commented 9 months ago

once again sorry about that - i didnt filter out failed checkm when i made the query to get the representatives.

I have updated the file with these bad ones removed

thanks again!