phbradley / tcr-dist

Software tools for the analysis of epitope-specific T cell receptor (TCR) repertoires (scroll down for the README)
MIT License
79 stars 36 forks source link

Single Chain parsed_seq Input Generating Blank Output File #28

Closed cajames2 closed 6 years ago

cajames2 commented 6 years ago

Hello,

I am trying to run TCR-dist on a data set of parsed TCR alpha chains.

An abridged version of my data set is here in .txt format: clones_file.txt

The code I used to run the basic analysis script is as follows: python /Users/cajames2/tcr-dist/run_basic_analysis.py --organism human --parsed_seqs_file /Users/cajames2/TCRSeq/clones_file.tsv --make_fake_beta --make_fake_quals

The script then runs all the way through, but returns blank tables and plots. I ran the test "test_small_human_pairseqs_v1_parsed_seqs.tsv " data set and saw outputs. I also deleted beta chain columns and quality scores and ran only the alpha chain information with --make_fake_beta and --make_fake_quals and it worked just fine.

I think I have traced the issue to something that the _parse_tsvfile function is dependent on. I modified the parse_tsv.py script to print the _allclones file so that I could see whether my data was being read correctly and this file is blank after I run the run_basic_analysis.py script. However, when I run the parse_tsv.py script on my data independently, it reads my data and generates a populated _allclones file.

Do you have any insight into why the _parse_tsvfile function won't read my data when run in the context of the run_basic_analysis.py script, but works just fine when run independently?

jeremycfd commented 6 years ago

There are a couple of issues preventing your analysis. First of all, the file you appended does not include the cdr3a_nucseq column, which is required for a number of steps in the analysis and generates errors you can find in the .err files when missing. Second, --make_fake_beta and --make_fake_quals will only function properly when using --pair_seqs_file for your data input; if you are going parse the sequences yourself, you will also need to generate your own fake qualities and fake beta chains. I'm attaching a file with with an example parsed ABtcr that might be helpful in that regard: FakesExample.txt

Hope this helps. Feel free to email me directly if you need more assistance getting things to work.

Jeremy