refgenie / refgenconf

A Python object for standardized reference genome assets.
http://refgenie.databio.org
BSD 2-Clause "Simplified" License
3 stars 6 forks source link

default seek keys #60

Closed nsheff closed 5 years ago

nsheff commented 5 years ago

Right now seeking for the fasta asset doesn't work, because it expects you to type fasta.fasta:

 refgenie seek hg38/fasta
/ext/yeti/refgenomes/hg38/fasta/default
nsheff@puma:/project/shefflab/www/refgenie_raw$ refgenie seek hg38/fasta.fasta
/ext/yeti/refgenomes/hg38/fasta/default/hg38.fa

See, the vanilla fasta key is just pointing to the folder.

Do we want to enable a default when there are keys present? I thought if the name of the seek key matched the asset name, then the repetition shouldn't be required?

nsheff commented 5 years ago

Related issue:

refgenie getseq -g hg38 -l chr1:5-10
Traceback (most recent call last):
  File "/home/nsheff/.local/lib/python3.5/site-packages/pyfaidx/__init__.py", line 359, in __init__
    if mutable else 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/ext/yeti/refgenomes/hg38/fasta/default'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nsheff/.local/bin/refgenie", line 10, in <module>
    sys.exit(main())
  File "/home/nsheff/.local/lib/python3.5/site-packages/refgenie/refgenie.py", line 659, in main
    refgenie_getseq(rgc, args.genome, args.locus)
  File "/home/nsheff/.local/lib/python3.5/site-packages/refgenie/refgenie.py", line 506, in refgenie_getseq
    fa = pyfaidx.Fasta(rgc.get_asset(genome, "fasta"))
  File "/home/nsheff/.local/lib/python3.5/site-packages/pyfaidx/__init__.py", line 996, in __init__
    build_index=build_index)
  File "/home/nsheff/.local/lib/python3.5/site-packages/pyfaidx/__init__.py", line 368, in __init__
    "Cannot read FASTA file %s" % filename)
pyfaidx.FastaNotFoundError: Cannot read FASTA file /ext/yeti/refgenomes/hg38/fasta/default
nsheff commented 5 years ago

getseq appears to be seeking for fasta instead of fasta.fasta.

I think we should make fasta work as it used to

stolarczyk commented 5 years ago

We might do that.

It is the question of what's the behavior a user expects. I've explicitly implemented it this way:

https://github.com/databio/refgenconf/blob/b9f67339aee733eed853dd537a0149eb715d8ed6/refgenconf/refgenconf.py#L851-L855

my reasoning was: if my asset is a dir (we decided to point to it with .) I can refer to it with no seek_key, otherwise (if it is a set of files where each of them has a separate seek_key defined) I have to specify the one I'm referring to explicitly

nsheff commented 5 years ago

I don't see a disadvantage of:

if the asset has seek keys, and a seek key is defined with the same name as the asset, then that is the default seek key returned if no key is provided.

you could still accomplish what you are proposing by putting in a self-named seek key with pointer to the folder.

but with your method you cannot do what we want to do in most cases, which is point to a file without a seek key, even when seek keys are defined (like in the case of fasta, a good example)

nsheff commented 5 years ago

that's working now, but shouldn't it list the non-keyed version in the asset list?

Local assets:
           hg38/   fasta.chrom_sizes:default, fasta.fai:default, fasta.fasta:default
          rCRSd/   bowtie2_index:default, fasta.chrom_sizes:default, fasta.fai:default, fasta.fasta:default

I think it should just say " fasta:default instead of fasta.fasta:default... in other words, fasta.fasta should not be a thing... it should just be fasta.

stolarczyk commented 5 years ago

fixed

[mstolarczyk@MichalsMBP test_genomes]: refgenie list -c genomes.yaml
Local genomes: mouse_chrM2x, rCRSd
Local recipes: fasta, bowtie2_index, bwa_index, hisat2_index, bismark_bt2_index, bismark_bt1_index, kallisto_index, salmon_index, epilog_index, star_index, gencode_gtf, ensembl_gtf, ensembl_rb, refgene_anno, feat_annotation
Local assets:
   mouse_chrM2x/   bowtie2_index:default, fasta.chrom_sizes:default, fasta.fai:default, fasta:default
          rCRSd/   bowtie2_index:default, fasta.chrom_sizes:test, fasta.fai:test, fasta:test