Open jdwinkler-lanzatech opened 2 years ago
Hi, thanks for the suggestion. genome_updater selects and filters data based on the assembly_summary.txt
file provided by NCBI (more info https://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt). Besides the filter parameters, the -F
option allow custom filtering for data selection. However, I'm not sure the information you refer to is contained in that file.
Column 8 would be the target, I think. I believe right now the -F option is an exact match though, so I am thinking of another flag that basically uses grep behind the scenes to implement the matching. I'd basically want to grab all the assemblies with an organism name matching "methano*", if that makes sense. Obviously would not be perfect, but could be handy if you have a specific enough search string.
Great, thanks! I figure it is a logical addition to the custom filtering offered by -F already.
Hi,
I was wondering if it would be possible to provide a filtering option based on assembly (species/assigned) name? I often want to pull a group of microbes with a general metabolic capabilities (say methanogenesis) but I have to manually pick out the TaxIDs currently to do so. Not a major problem, but the feature might be useful for other people too!