Closed sammyjava closed 1 year ago
Agreed that the behavior when no args supplied is underwhelming, but try:
docker compose -f compose.yml -f compose.prod.yml run redis_loader gff --help
and see if that --helps at all
Thanks, but I'm not that interested in getting the job done as much as getting the documentation improved. I know I can ask you, @adf-ncgr , for an example. :)
But head me off at the pass if you don't want me to file newbie issues.
Well, I was just intending for you to consider whether the additional usage for gff mode was sufficient for a newbie; probably not!
And, what IS a chromosome GFF? Just a GFF with the @chromosome records??? Is that a separate GFF? Seriously, I've never heard of a "chromosome GFF" before, but I'm no bioinformatician. I presume that's why I got this. But, bottom line: full example with example files is GOLD. Then I could see what "chromosome GFF" is, etc.
[shokin@shokin-gcv gcv-docker-compose]$ sudo docker compose -f compose.yml -f compose.prod.yml run redis_loader gff --genus=Phaseolus --species=dumosus --strain=PI311196.G19833 --gene-gff /falafel/shokin/ph-pangenome/liftoff/PI311196/G19833/phadu.PI311196.gnm1.phavu.G19833.gnm2.ann1.gff3
[+] Building 0.0s (0/0)
[+] Creating 1/0
✔ Container gcv-redis-1 Running 0.0s
[+] Building 0.0s (0/0)
"chromosomeIdx" already exists in RediSearch
Data will be appended to index "chromosomeIdx"
"geneIdx" already exists in RediSearch
Data will be appended to index "geneIdx"
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 354, in <module>
main()
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 350, in main
args.command(loader, args)
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 49, in gff
args.chromosome_gff,
^^^^^^^^^^^^^^^^^^^
AttributeError: 'Namespace' object has no attribute 'chromosome_gff'
I'm not going to presume to nominate exemplar files for the github repo at this time, but you can have a look at: /falafel/legumeinfo/data/v2/Aeschynomene/evenia/genomes/CIAT22838.gnm1.XF73/aesev.CIAT22838.gnm1.XF73.genome_main.gff3.gz
Ahhhh the ol' /genomes/ GFF file, rings a bell! Thanks! So, of course, the next question: is there a standard script for building those from the genome FASTA? Like fasta2gff or something? (Asking, because if there is, that should be added to the docs here since I'm not sure everyone knows what a "chromosome GFF" is, but they likely have them in a multi-FASTA.)
I use: /falafel/adf/sw/hacks/lis_fasta2gff3.pl although this doesn't attempt to solve the "what is a chromosome" question. Looks like there's another approach here: https://github.com/legumeinfo/datastore-specifications/blob/main/scripts/chrlen_to_gff.sh
Since @alancleary graduated, I think I've been forbidden from adding perl scripts to the GCV repos...
[shokin@dal datastore-specifications]$ scripts/chrlen_to_gff.sh ~/Phaseolus/acutifolius/genomes/Tep23.gnm1/phaac.Tep23.gnm1.genome_main.fna phaac.Tep23.gnm1
##gff-version 3
scripts/chrlen_to_gff.sh: line 39: type: unbound variable
[shokin@dal Tep23.gnm1]$ cat phaac.Tep23.gnm1.genome_main.fna | /falafel/adf/sw/hacks/lis_fasta2gff3.pl -type=chromosome > phaac.Tep23.gnm1.genome_main.gff3
works fine.
I think I'm gonna quit this exercise.
[shokin@shokin-gcv gcv-docker-compose]$ sudo docker compose -f compose.yml -f compose.prod.yml run redis_loader gff --genus=Phaseolus --species=dumosus --strain=PI311196.G19833 --gene-gff=/falafel/shokin/ph-pangenome/liftoff/PI311196/G19833/phadu.PI311196.gnm1.phavu.G19833.gnm2.ann1.gff3 --chromosome-gff=/falafel/gepts_lab/legumeinfo/Phaseolus/dumosus/genomes/PI311196.gnm1/phadu.PI311196.gnm1.genome_main.gff3 --gfa=/falafel/shokin/ph-pangenome/liftoff/PI311196/G19833/phadu.PI311196.gnm1.phavu.G19833.gnm2.ann1.gfa.tsv
[+] Building 0.0s (0/0)
[+] Creating 1/0
✔ Container gcv-redis-1 Running 0.0s
[+] Building 0.0s (0/0)
"chromosomeIdx" already exists in RediSearch
Data will be appended to index "chromosomeIdx"
"geneIdx" already exists in RediSearch
Data will be appended to index "geneIdx"
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 354, in <module>
main()
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 350, in main
args.command(loader, args)
File "/usr/local/lib/python3.11/site-packages/redis_loader/__main__.py", line 44, in gff
loadFromGFF(
File "/usr/local/lib/python3.11/site-packages/redis_loader/loaders/gff.py", line 125, in loadFromGFF
transferChromosomes(redisearch_loader, genus, species, chromosome_gff)
File "/usr/local/lib/python3.11/site-packages/redis_loader/loaders/gff.py", line 29, in transferChromosomes
gffutils.create_db(
File "/usr/local/lib/python3.11/site-packages/gffutils/create.py", line 1359, in create_db
iterator = iterators.DataIterator(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gffutils/iterators.py", line 314, in DataIterator
raise ValueError(
ValueError: /falafel/gepts_lab/legumeinfo/Phaseolus/dumosus/genomes/PI311196.gnm1/phadu.PI311196.gnm1.genome_main.gff3 cannot be found and does not appear to be a URL
[shokin@shokin-gcv gcv-docker-compose]$ cat /falafel/gepts_lab/legumeinfo/Phaseolus/dumosus/genomes/PI311196.gnm1/phadu.PI311196.gnm1.genome_main.gff3
##gff-version 3
phadu.PI311196.gnm1.Chr01 . chromosome 1 59123956 . . . ID=phadu.PI311196.gnm1.Chr01;Name=phadu.PI311196.gnm1.Chr01
phadu.PI311196.gnm1.Chr02 . chromosome 1 61340039 . . . ID=phadu.PI311196.gnm1.Chr02;Name=phadu.PI311196.gnm1.Chr02
phadu.PI311196.gnm1.Chr03 . chromosome 1 59791010 . . . ID=phadu.PI311196.gnm1.Chr03;Name=phadu.PI311196.gnm1.Chr03
phadu.PI311196.gnm1.Chr04 . chromosome 1 61329659 . . . ID=phadu.PI311196.gnm1.Chr04;Name=phadu.PI311196.gnm1.Chr04
phadu.PI311196.gnm1.Chr05 . chromosome 1 52062745 . . . ID=phadu.PI311196.gnm1.Chr05;Name=phadu.PI311196.gnm1.Chr05
phadu.PI311196.gnm1.Chr06 . chromosome 1 33783503 . . . ID=phadu.PI311196.gnm1.Chr06;Name=phadu.PI311196.gnm1.Chr06
phadu.PI311196.gnm1.Chr07 . chromosome 1 64926652 . . . ID=phadu.PI311196.gnm1.Chr07;Name=phadu.PI311196.gnm1.Chr07
phadu.PI311196.gnm1.Chr08 . chromosome 1 76278163 . . . ID=phadu.PI311196.gnm1.Chr08;Name=phadu.PI311196.gnm1.Chr08
phadu.PI311196.gnm1.Chr09 . chromosome 1 45873673 . . . ID=phadu.PI311196.gnm1.Chr09;Name=phadu.PI311196.gnm1.Chr09
phadu.PI311196.gnm1.Chr10 . chromosome 1 54228812 . . . ID=phadu.PI311196.gnm1.Chr10;Name=phadu.PI311196.gnm1.Chr10
phadu.PI311196.gnm1.Chr11 . chromosome 1 66020725 . . . ID=phadu.PI311196.gnm1.Chr11;Name=phadu.PI311196.gnm1.Chr11
phadu.PI311196.gnm1.Super-Scaffold_27_32 . supercontig 1 8582530 . . . ID=phadu.PI311196.gnm1.Super-Scaffold_27_32;Name=phadu.PI311196.gnm1.Super-Scaffold_27_32
[shokin@shokin-gcv gcv-docker-compose]$
I think this is a simple issue of the script not being able to see paths outside the container try a --bind /falafel:/falafel or something similar
Whatever. I'm not really keen to learn how to build a GCV, just thought I'd give it a quick shot. I think I'll close this issue and just post an issue saying a HOWTO build from a GFF would be helpful. Then I'll review that.
Are there any examples provided of loading a GFF file into Redis starting from scratch? I'm mystified. I can't figure out how to do it! I'm one of those users that really needs an end-to-end example of what to do. Here's what I've tried. I also don't know what a "chromosome GFF" is. My chromosomes are in a multi-FASTA. :) I just used the annotation GFF3 that has the chromosomes as sequences, I'm guessing I need to build a "chromosome GFF" somehow. :)
[Note: I'm hitting this like an unknown GitHub user, hopefully to help flesh out the documentation a bit.]
I'm running redis_loader 1.2.3 schema 1.1.0
OK, from that I discern I use a command like this: