rbloom5 / ImmuneRep

1 stars 0 forks source link

Map out CDR3 extraction and V-J germline assignments #3

Open rbloom5 opened 9 years ago

rbloom5 commented 9 years ago

Possible packages I know of that can do this: -VDJfasta -everything under "related sites" on this page: http://omictools.com/vdjfasta-s2267.html -iHMMune align: http://www.emi.unsw.edu.au/~ihmmune/index.php -Cloanalyst or SoDa: http://www.bu.edu/computationalimmunology/research/software/

rzeller commented 9 years ago

I'm looking into Laserson's vdj python package on github... and I'm not sure I'm going to be able to get it to work. There's basically no documentation and it doesn't work out of the box.

rbloom5 commented 9 years ago

yea I agree, it's pretty tough to get moving. I think a good place to start is in the /bin/summarize_full_data_pipeline

I think that's the main file that runs everything. But there might be a little work to ensure that all the files are in the right place and the PATH has all the locations. Also, I don't quite understand how to get it started with the OptionParser

rzeller commented 9 years ago

Should I keep trying to get laserson/vdj working or would it be better to help Raman with AbMining? If the latter, how can I help you Raman?

ramanktalwar commented 9 years ago

I’ve actually got a pretty good handle on AbMining. I’m much more interested in getting VDJ working now. Let’s work on that.

On Dec 5, 2014, at 11:31 AM, Robby Zeller notifications@github.com wrote:

Should I keep trying to get laserson/vdj working or would it be better to help Raman with AbMining? If the latter, how can I help you Raman?

— Reply to this email directly or view it on GitHub https://github.com/rbloom5/ImmuneRep/issues/3#issuecomment-65841647.

rzeller commented 9 years ago

OK cool.

rzeller commented 9 years ago

The imgt database link given in vdj/data/readme.md doesn't work.

rbloom5 commented 9 years ago

from vdj/data IGHV.fasta, IGHJ.fasta, and IGHD.fasta should be all you need. These are the databases of heavy chain V, D, and J reference sequences that most programs use to match up with

rbloom5 commented 9 years ago

Also, I think from http://www.imgt.org/IMGTdownloads.html if you download the link "IMGT/LIGM-DB flat files or using FTP at EBI (UK)" that should be everything together. I think...

rzeller commented 9 years ago

I tried imgt.dat from the link above and I get the same error as before:

>>> from vdj import alignment
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vdj/alignment.py", line 9, in <module>
    import refseq
  File "vdj/refseq.py", line 108, in <module>
    header_data = parse_imgt_fasta_header(record)
  File "vdj/refseq.py", line 41, in parse_imgt_fasta_header
    data['allele'] = raw_data[1]
IndexError: list index out of range

Maybe I'm focusing too much on getting alignment.py working, but let me know if anybody gets past this.

rzeller commented 9 years ago

Nevermind, I don't think imgt.dat is causing the issue. I'll give an update once I've tracked down the source of the error.

rbloom5 commented 9 years ago

To get VDJfasta to work - download the file and follow all the directions on the Readme. Also, if you have the latest version of HMMer, then go into VDJfasta.pm and on line 493 delete "--allcol." It may just be better to download HMMer3.0 which VDJfasta is known to be compatible with...