plantinformatics / pretzel

Javascript full-stack framework for Big Data visualisation and analysis
GNU General Public License v3.0
43 stars 13 forks source link

Search all .vcf.gz files of the selected dataset #391

Closed Don-Isdale closed 3 days ago

Don-Isdale commented 3 weeks ago

Part of #383


Observable outcomes :

This enables the User to search for the given SNP names across all chromosomes of the selected dataset, and request the lookup of Genotype values for those SNP names.

Measure with :

Perform the script either from the command line, or by adding parameters to a request sent by the application, and confirm from the trace that the correct lookup command is performed, and that the results are correct, include all chromosomes of the dataset, and apply to the requested SNP names.


Task Sequence :


Don-Isdale commented 3 weeks ago

Test

This facility is tested using the prototype 'Genotype Search' panel / dialog, which provides parameters : dataset Id, Sample names, SNP names.

Test Case

From the dataset Id the list of non-link .vcf.gz files are requested.

Results

Server log extract

Confirming that the list of .vcf.gz file names excluding soft-links is requested.

childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app

+ scope=noLinks
+ cd tmp/vcf/201028_40K_DAS5_samples_XT_exomeIDs
+ '[' noLinks = noLinks ']'

::ffff:127.0.0.1 - - [06/Jun/2024:18:26:52 +0000] "GET /api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs HTTP/1.1" 200 605 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"

API Request

http://localhost:3000/api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs

params :
id=201028_40K_DAS5_samples_XT_exomeIDs

API Result

(replace-string "\\n" "\n")

{"text":"scope=
 3170787 Jun  6 21:15 1A_copy.MAF.vcf.gz
  162731 Jun  6 21:15 1A_copy.MAF.vcf.gz.csi
 2671431 Aug  9  2022 1A_copy.vcf.gz
  165069 Jun  6 21:15 1A_copy.vcf.gz.csi
  463961 Jan 29 12:12 1A.MAF.SNPList.vcf.gz
  149577 Jan 29 12:12 1A.MAF.SNPList.vcf.gz.csi
 3164247 Jan 29 12:10 1A.MAF.vcf.gz
  159238 Jan 29 12:10 1A.MAF.vcf.gz.csi
  159118 Jan 29 12:07 1A.vcf.gz.csi
  463961 Jan 29 16:04 1B.MAF.SNPList.vcf.gz
  149577 Jan 29 16:04 1B.MAF.SNPList.vcf.gz.csi
 3164248 Jan 29 16:04 1B.MAF.vcf.gz
  159236 Jan 29 16:04 1B.MAF.vcf.gz.csi
  159118 Jan 29 16:04 1B.vcf.gz.csi
"}

Test Case

In this case there is 1 non-link .vcf.gz file, and this file name is included as parameter in the following request.

Results

Server log extract

Confirming that the correct .vcf.gz is used.


::ffff:127.0.0.1 - - [06/Jun/2024:12:24:49 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 - "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
The request processing time is 50.705 ms. for /vcfGenotypeLookupPost
vcfGenotypeLookup 201028_40K_DAS5_samples_XT_exomeIDs undefined Numerical 74 [
  'query',
  '201028_40K_DAS5_samples_XT_exomeIDs',
  '1A_copy.vcf.gz',
  '',
  '',
  '',
  '',
  '-queryStart',
  '-H',
  '-f',
  '%ID\t%POS\t%REF\t%ALT\t%INFO[\t%GT]\n',
  '-queryEnd'
]
childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app

+ bcftoolsCommand query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz '' '' -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317

+ vcfGz=201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz

+ echo isecDatasetIdsArray : 0 , vcfGzs 0 , snpNames 1 scaffold38755_1235130 scaffold38755_1337276

+ bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317 -H -f '%ID    %POS    %REF    %ALT    %INFO[  %GT]
' -i ' ID="scaffold38755_1235130" || ID="scaffold38755_1337276" '

cbWrap null #[1]ID  [2]POS  [3]REF  [4]ALT  [5](null)   [6]ExomeCapture-DAS5-001803:GT  [7]ExomeCapture-DAS5-001365:GT  [8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130   1235130 C   T   F_MISSING=0.0259067;NS=564;AN=1128; undefined

::ffff:127.0.0.1 - - [06/Jun/2024:12:24:51 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 387 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"

API Request

http://localhost:3000/api/Blocks/vcfGenotypeLookupPost

POST Data

(replace-string "," ",\n")

{"datasetId":"201028_40K_DAS5_samples_XT_exomeIDs",
  "preArgs":{
    "samples":"ExomeCapture-DAS5-001803\nExomeCapture-DAS5-001365\nExomeCapture-DAS5-002317",
    "requestInfo":false,
    "requestFormat":"Numerical",
    "requestSamplesAll":false,
    "snpPolymorphismFilter":false,
    "mafThreshold":0,
    "mafUpper":false,
    "featureCallRateThreshold":0,
    "datasetVcfFile":"1A_copy.vcf.gz",
    "snpNames":"scaffold38755_1235130\nscaffold38755_1337276"},
  "nLines":100,
  "options":{}}

API Result


{"text":"#[1]ID\t[2]POS\t[3]REF\t[4]ALT\t[5](null)\t[6]ExomeCapture-DAS5-001803:GT\t[7]ExomeCapture-DAS5-001365:GT\t[8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130\t1235130\tC\tT\tF_MISSING=0.0259067;NS=564;AN=1128;MAF=0.150709;AC=170;AC_Het=12\t0/0\t0/0\t0/0
scaffold38755_1337276\t1337276\tG\tC\tF_MISSING=0.0138169;NS=571;AN=1142;MAF=0.400175;AC=457;AC_Het=1\t0/0\t0/0\t1/1
"}