Closed elijgarcia closed 3 years ago
wow, you're using paired loci, that's awesome! If you see anything (else...) weird at all, do please open an issue -- it's stable enough that nothing big will change in terms of behavior, but it is still a bit bleeding edge.
As to your issue, yep I just don't have --airr-output in testing, so hadn't noticed that I needed to update it to use --paired-outdir rather than --outfname. That's a quick fix I should get to today, I just need to clean up something first.
Also, that's great that you're working on reformatting output for olmsted -- that's been our todo list for far too long, so we'd definitely be interested in incorporating your changes if you'd submit pull requests when you're done.
ok this should do it.
It seems to be working quite well, we often use 10x sequencing on our sorting of PBMCs so it's great that you added that feature. Thank you for your speedy response and fix!
Oh I'm simply trying to use the tools that those creators made! It quite amazing what they have done
I am still getting the same error:
Traceback (most recent call last):
File "/opt/applications/partis/0.16.0/gnu/bin/partis", line 1066, in <module>
processargs.process(args)
File "/opt/applications/partis/0.16.0/gnu/python/processargs.py", line 250, in process
raise Exception('have to set --outfname if --airr-output is set')
Exception: have to set --outfname if --airr-output is set
Traceback (most recent call last):
File "/opt/applications/partis/0.16.0/gnu/bin/partis", line 1070, in <module>
args.func(args)
File "/opt/applications/partis/0.16.0/gnu/bin/partis", line 260, in run_partitiondriver
run_all_loci(args)
File "/opt/applications/partis/0.16.0/gnu/bin/partis", line 749, in run_all_loci
run_step('cache-parameters', ltmp, auto_cache=True, skip_missing_input=True)
File "/opt/applications/partis/0.16.0/gnu/bin/partis", line 507, in run_step
utils.simplerun(' '.join(prep_args(ltmp)), dryrun=args.dry_run)
File "/opt/applications/partis/0.16.0/gnu/python/utils.py", line 3458, in simplerun
subprocess.check_call(cmd_str if shell else cmd_str.split(), env=os.environ, shell=shell)
File "/opt/applications/python/2.7.11/gnu/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['/opt/applications/partis/0.16.0/gnu/bin/partis', 'cache-parameters', '--locus', 'igh', '--infname', './malaria/mouse_nterm_klh_2021/out2home-3/out126-airr/igh.fa', '--species', 'mouse', '--airr-output', '--parameter-dir', './malaria/mouse_nterm_klh_2021/out2home-3/out126-airr/parameters/igh', '--input-metafname', './malaria/mouse_nterm_klh_2021/out2home-3/out126-airr/meta.yaml', '--sw-cachefname', './malaria/mouse_nterm_klh_2021/out2home-3/out126-airr/parameters/igh/sw-cache.yaml']' returned non-zero exit status 1
Is there a way to convert or create a copy of the partition-igh.yaml? I saw there was a closed issue that led to the created --airr-output argument, but I wasn't sure if there was a way to do the conversion after a partition was run.
hmm, that really shouldn't be possible now if --paired-loci is set. Could you run with --print-git-commit to make sure you picked up the most recent version?
I am still getting the same Exception error, and I got the following from --print-git-commit
:
commit: 625898dbbc9f96398954e10f236ef24f5f4e78a8
tag: 0.16.0 (well, 397 commits ahead of)
That is on my personal computer where I can pull new docker images quite easily. The high performance computing core at the institute I work at recently updated to parts/0.16.0 and I believe they are on commit 376. Although, I did ask them to change that one line of code, and I am still getting the error.
hmmm that's super weird. That is the correct commit hash, but the exception is coming from the old code -- in the trace above it's at line 250, which is where it was in the last docker image, but in the code from that commit hash it's at line 255.
Oh sorry for the confusion, the error stdout above was from the HPC node that only had that line changed at 250. But when I had the latest commit on the docker container, it referenced line 255
whoops, sorry, you must be auto parameter caching (i.e. not running a separate cache-parameters step first), I forgot to check that possibility. This should do it. It should finish building on docker hub in a half hour or so.
It's working great on my end now, thank you for the speedy fix! Is there a benefit to running the cache-parameters/annotation/partition steps individually? From the user standpoint it might be easier to diagnose an issue (although your stdout when there is an error is generally very helpful!), but I'm wondering what your opinion/logic on it
Great!
There might be some useful thoughts here. But mostly the reason I almost always run a separate cache-parameters step is that it's safer, particularly in the context of production/real data. If I run them separately, especially with --refuse-to-cache-parameters set for partitioning, then I can be sure that the right options were used for parameter caching, and that it was run on all sequences, and the parameters went to where i expect them to. For instance I'm usually running several different flavors of partitioning (different seed sequences, different random subsamples, different stopping criteria) on the same cached parameters. Or for instance if you change the sequences in the input file without changing its name, things will be completely wrong if run as one step (since it'll use the old parameters), but fine if you cache parameters separately.
If you're just running once without setting any special command line args, running as one step is fine, but if you're doing more complicated things it's probably safer to do two steps.
I see, thank you for your insight!
Hi, I have a quick question on the same topic - AIRR output with paired-loci. The options --paired-loci and --airr-output work very well for my run, but the output tsv file is saved under the folder "single-chain". Could you clarify which file is the results of the paired chain clonal type partition? Many thanks :-)
Each airr output tsv corresponds to the regular partition yaml next to which it appears -- i.e. the airr tsvs in the single-chain/ dir are for single chain partitions, while the joint/paired partitions are in the main output dir: https://github.com/psathyrella/partis/blob/main/docs/paired-loci.md#output-directory.
e.g. this
./bin/partis partition --paired-indir test/paired/ref-results/test/simu --parameter-dir test/paired/ref-results/test/parameters/simu --paired-outdir _output/tmp-pair --paired-loci --airr-output
gives this dir structure:
[thneed] partis/ > find _output/tmp-pair -type f
_output/tmp-pair/partition-igh.yaml
_output/tmp-pair/single-chain/partition-igl.tsv
_output/tmp-pair/single-chain/partition-igk.yaml
_output/tmp-pair/single-chain/partition-igh.yaml
_output/tmp-pair/single-chain/partition-igk.tsv
_output/tmp-pair/single-chain/partition-igl.yaml
_output/tmp-pair/single-chain/partition-igh.tsv
_output/tmp-pair/igh+igk/partition-igk.yaml
_output/tmp-pair/igh+igk/partition-igh.yaml
_output/tmp-pair/igh+igk/partition-igk.tsv
_output/tmp-pair/igh+igk/partition-igh.tsv
_output/tmp-pair/partition-igh.tsv
_output/tmp-pair/igh+igl/partition-igl.tsv
_output/tmp-pair/igh+igl/partition-igh.yaml
_output/tmp-pair/igh+igl/partition-igl.yaml
_output/tmp-pair/igh+igl/partition-igh.tsv
Thanks for rapid response!! Looking at my results, I have the igh+igk/l folders, but I don't have any .yaml or .tsv files in these folders. I only have .fa files in the paired chain folders. That's why I was confused about the results.
Perhaps I made mistakes here? Please the following: bin/partis partition --infname /mydata/data/$SAM/filtered_contig.fasta \ --paired-loci \ --airr-output \ --paired-outdir /mydata/results/$SAM \ --plotdir /mydata/figs/$SAM \ --get-selection-metrics
Thank you so much for helps.
can you paste the full std out?
I am having issues outputting AIRR-formatted yaml's when running partis partition on BCR sequences that were made with the 10x pipeline. I am able to run the partition just fine using the --paired-loci and --paired-outdir. However, when I specify I want an -airr-output, it raises the exception that I must have an --outfname, but that's not possible when using the --paired-loci argument. I am trying to reformat the output so I can use it for the olmested project. Is there alternative ways to reformat paired loci data into the AIRR format?
An example of the code I'm running:
partis partition --infname ./path-to/filtered_contig_123.fasta --paired-loci --species mouse --airr-output --paired-outdir ./path-to/out123-1
Which will then raise the Exception: have to set --outfname if --airr-output is set