GTDB v214 was requested by one of our users on Discord, so I've added it to our available GTDB versions (and made it the default). Since none of the SCG names appear to have changed on the GTDB end of things, this was fairly straightforward.
The one annoying caveat is that GTDB slightly changed the structure of their representative sequence archives so that the FASTA files are now contained inside an inner folder called 'individual'. To make it compatible with the code for previous versions, I moved the FASTAs one directory level up:
if self.ctx.target_database_release == 'v214.1':
inner_path = os.path.join(self.ctx.msa_individual_genes_dir_path, 'individual')
for file in glob.glob(inner_path + '/*.faa'):
shutil.move(file, self.ctx.msa_individual_genes_dir_path)
os.rmdir(inner_path)
In hindsight, it perhaps would have been much simpler to simply append the inner directory to the self.ctx.msa_individual_genes_dir_path variable and move on. š¤ OH WELL. I will happily change it if even one person says 'that sounds better' to me. :)
Regardless, it will be a bit annoying if the next release of GTDB also has this new archive structure, because then we will have to remember to update our if statement. Formatting inconsistencies are the mosquitoes of data download: typically harmless, but they make your life just a little bit worse. š
GTDB v214 was requested by one of our users on Discord, so I've added it to our available GTDB versions (and made it the default). Since none of the SCG names appear to have changed on the GTDB end of things, this was fairly straightforward.
The one annoying caveat is that GTDB slightly changed the structure of their representative sequence archives so that the FASTA files are now contained inside an inner folder called 'individual'. To make it compatible with the code for previous versions, I moved the FASTAs one directory level up:
In hindsight, it perhaps would have been much simpler to simply append the inner directory to the
self.ctx.msa_individual_genes_dir_path
variable and move on. š¤ OH WELL. I will happily change it if even one person says 'that sounds better' to me. :)Regardless, it will be a bit annoying if the next release of GTDB also has this new archive structure, because then we will have to remember to update our if statement. Formatting inconsistencies are the mosquitoes of data download: typically harmless, but they make your life just a little bit worse. š