poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Getting more information out of trident list #285

Open TCLamnidis opened 9 months ago

TCLamnidis commented 9 months ago

After talking with Stephan today, we thought it would be nice to have the option of getting more information about ecah package out of trident list. This would streamline integration into data analysis, and also downstream processes (like thetis).

Below is an example of a TSV with information contained in the POSEIDON.yml of 2 packages as a reference:

poseidon_version    package_dir title   description contributors    package_version last_modified   genotype_format geno_file   geno_file_chksum    snp_file    snp_file_chksum ind_file    ind_file_chksum snp_set janno_file  janno_file_chksum   sequencing_source_file  sequencing_source_file_chksum   bib_file    bib_file_chksum readme_file changelog_file
2.5.0   /Users/lamnidis/poseidon_packages/community-archive/2018_OlaldeNature   2018_OlaldeNature   Ancient genomes from the Bell Beaker period in Europe. Originally AADR v42.4.   Ayshin Ghalichi (ghalichi@shh.mpg.de)   2.1.1   2023-07-11  PLINK   2018_OlaldeNature.bed   e11e8a7ef0b74e964732db0cbe5046f4    2018_OlaldeNature.bim   7a7ef4d4f9c78a0bba32a329b6162dbd    2018_OlaldeNature.fam   95f51d4ef3797b556e6c0154bf8d443d    1240K   2018_OlaldeNature.janno                             
2.5.0   /Users/lamnidis/poseidon_packages/community-archive/2018_Lamnidis_Fennoscandia  2018_Lamnidis_Fennoscandia  Ancient genomes from Finland and Russia.    Thiseas Lamnidis (lamnidisi@shh.mpg.de) 2.1.0   2023-07-04  PLINK   2018_Lamnidis_Fennoscandia.bed  74d8d52d45a0d2f6ed1212af5d2f4268    2018_Lamnidis_Fennoscandia.bim  10fe736b07171086524ec92dc5e06a22    2018_Lamnidis_Fennoscandia.fam  90c1b106d15bceccc1e25c34d3060d75    1240K   2018_Lamnidis_Fennoscandia.janno                                

trident list --remote --packages --raw already shows some of this information, so adding more columns to the ouput with a dedicated flag would do the trick.

nevrome commented 9 months ago

Sound like a good idea to me :+1:

What would be a solid interface for this? Just a --verbose (?) flag that adds all of these columns to the output? Or a more sophisticated argument to request specific columns?

TCLamnidis commented 9 months ago

IMO spitting out all the info in the YAML file with one flag is enough. It's easy enough to select a subset of columns downstream if need be. The main thing for my use case here is the package_directory column, which is not in the YANL, but implicitly known (as the path to the file)

stschiff commented 8 months ago

This is contingent on an update to the server API, since list also needs to feature --remote, so any listing we perform here must be possible also from the server. So this issue is somewhat contingent on #272 and #273. And I'm working on those.