ireceptor-plus / issues

0 stars 0 forks source link

Genotype inference app in Tapis #85

Closed bcorrie closed 2 years ago

bcorrie commented 3 years ago

VDJBase based genotype inference using Docker/singularity.

schristley commented 3 years ago

Hey @bcorrie , I started porting your implementation scripts onto TACC (stampede2). Do you have some example data (where there's known output) that I can use for test runs?

bcorrie commented 3 years ago

If you go to this query page: https://gateway.ireceptor.org/sequences?query_id=45382 and download the data, you should be able to use that as the download_file input to the App. That is essentially what the Gateway does - downloads the data and ZIPs it. The App expects a Gateway download ZIP file.

If you want to have a look at our crude App implementation, you can give this a try: https://gateway-analysis.ireceptor.org/sequences?query_id=8434

At the bottom of the sequence list, you should get a list of Apps and you can run VDJBase to get the VDJBase outputs from the same data set. You can then compare those outputs to what you get on TACC. This is our analysis test/development Gateway so it may be flakey depending on when you run the job 8-) Your account should work here.

VDJBase is computationally pretty expensive, and the job is only using 1 CPU, so don't run it on a large data set unless you want to wait 8-) Even on 10K sequences it takes 15 minutes or so... I am running a test job now, will let you know when its done with a snippet of some examples - assuming it works.

I am away tomorrow so won't be able to respond to question until Monday.

bcorrie commented 3 years ago

FYI, the repertoire used above is a Repertoire that Gur's group has on VDJBase already - so we are using that as a test case for running VDJBase to produce a genotype on a repertoire from the Gateway VS what has been stored on VDJBase for the same repertoire.

bcorrie commented 3 years ago

Job took 30 minutes.

First 10 lines of genotype.tsv file from the analysis. There should be an OGRDB PDF report and a CSV file also.

gene    alleles counts  total   note    kh  kd  kt  kq  k_diff  GENOTYPED_ALLELES   Freq_by_Clone   Freq_by_Seq
IGHV1-2 04,02   5,3 8       -1.370836624    2.372557252 1.340383293 -0.072656709    1.032173959 04,02   4;2 901;10
IGHV1-3 NA  NA  NA  NA  NA  NA  NA  NA  1000    Deletion    NA  NA
IGHV1-8 01,02   4,0 4       2.271371433 1.395405278 0.705221364 -0.086020671    0.875966155 1   4   376
IGHV1-18    01,04   4,0 4       2.271371433 1.395405278 0.705221364 -0.086020671    0.875966155 1   3   67
IGHV1-46    01,02,04    1,1,0   3       0.049435941 1.443228259 1.214200925 0.875061263 0.229027334 01,02   1;1 294;12
IGHV1-69    06,02,04,08 3,0,0,0 3       2.041045349 1.384070733 0.866432798 0.273001272 0.656974616 6   2   22
IGHV2-5 02,08,01,05 2,1,0,0 3       0.51384523  1.68864601  1.287072697 0.750122527 0.401573313 02,08   0;0 54;0
IGHV2-70    1   1   1       1.367499276 1.148507737 0.975961759 0.77815125  0.218991539 1   1   13
IGHV3-7 1   3   3       2.041045349 1.384070733 0.866432798 0.273001272 0.656974616 1   3   15
IGHV3-9 01,02   3,0 3       2.041045349 1.384070733 0.866432798 0.273001272 0.656974616 1   3   41
schristley commented 3 years ago

ported to stampede2, ran on the same input and looks like I got the same results.

schristley commented 2 years ago

@bcorrie @ (Gur) Should update the vdjbase singularity image to use immcantation 4.2.0 which has a new IgBlast with some important bug fixes. Should also create a job submission JSON and commit it to the irplus/tapis repository that creates the singularity image from the docker, that will help document where the image comes from. When I searched on docker hub, I did find something about vdjbase, but wasn't clear if that was the "official" one.

bcorrie commented 2 years ago

@schristley should we mark this as closed? It is working - and other issues such as commonality between TACC/CC are captured in other issues (#84 ) and we should create a separate issue for an update to VDJBase itself if we need that...

schristley commented 2 years ago

Done, in tapis repository