Closed daniel-hui closed 1 year ago
Hi Daniel,
Thank you for your question. It seems like you have run some commands several times such that your AGDS file has already had the apc_protein_function
channel, which caused the error. Could you paste the information on your chromosome 22 GDS file before running "Step 3: Generate the annotated GDS (aGDS) file"?
Best, Xihao
Thanks for getting back to me. I remade the chr22 GDS file and uploaded it here https://drive.google.com/file/d/19YOYDN7A7Fodyrce_IkH2e6uG5ByKQKX/view?usp=share_link (it is different than the chr22 GDS file after I tried running step 3). This is the command and output when I remade the GDS file:
Rscript /project/ritchie07/personal/daniel/tools/STAARpipeline/convertVCF2GDS.R NULL vcf chr22_mac1_GDS 1 /project/ritchie07/personal/daniel/A6K/chr22_mac1.vcf.gz
[1] "NULL"
[2] "vcf"
[3] "chr22_mac1_GDS"
[4] "1"
[5] "/project/ritchie07/personal/daniel/A6K/chr22_mac1.vcf.gz"
[1] "/project/ritchie07/personal/daniel/A6K/chr22_mac1.vcf.gz"
Loading required package: gdsfmt
Running with 28 thread(s).
converting VCF
Tue Dec 6 11:52:49 2022
Variant Call Format (VCF) Import:
file(s):
chr22_mac1.vcf.gz (442.8M)
file format: VCFv4.2
the number of sets of chromosomes (ploidy): 2
the number of samples: 6,280
genotype storage: bit2
compression method: LZMA_RA
# of samples: 6280
calculating the total number of variants ...
the total number of variants for import: 1,641,932
Writing to 28 files:
chr22_mac1_GDS_tmp01_79b76e0809c [1..58,640]
chr22_mac1_GDS_tmp02_79b743259777 [58,641..117,282]
chr22_mac1_GDS_tmp03_79b739469439 [117,283..175,922]
chr22_mac1_GDS_tmp04_79b773c1248 [175,923..234,564]
chr22_mac1_GDS_tmp05_79b760d4a4e4 [234,565..293,204]
chr22_mac1_GDS_tmp06_79b718ae12bc [293,205..351,846]
chr22_mac1_GDS_tmp07_79b7250da3bb [351,847..410,486]
chr22_mac1_GDS_tmp08_79b7393bbfa8 [410,487..469,126]
chr22_mac1_GDS_tmp09_79b76daf5ccb [469,127..527,768]
chr22_mac1_GDS_tmp10_79b7320d43c1 [527,769..586,408]
chr22_mac1_GDS_tmp11_79b732eec028 [586,409..645,050]
chr22_mac1_GDS_tmp12_79b72971da6b [645,051..703,690]
chr22_mac1_GDS_tmp13_79b762beae63 [703,691..762,332]
chr22_mac1_GDS_tmp14_79b76832b9ca [762,333..820,972]
chr22_mac1_GDS_tmp15_79b712c800a5 [820,973..879,612]
chr22_mac1_GDS_tmp16_79b7670dd1a3 [879,613..938,254]
chr22_mac1_GDS_tmp17_79b713ae2e68 [938,255..996,894]
chr22_mac1_GDS_tmp18_79b74ffdc65c [996,895..1,055,536]
chr22_mac1_GDS_tmp19_79b749b31d96 [1,055,537..1,114,176]
chr22_mac1_GDS_tmp20_79b73144c505 [1,114,177..1,172,818]
chr22_mac1_GDS_tmp21_79b743cb1ae1 [1,172,819..1,231,458]
chr22_mac1_GDS_tmp22_79b72aa7f3ff [1,231,459..1,290,098]
chr22_mac1_GDS_tmp23_79b77a12170c [1,290,099..1,348,740]
chr22_mac1_GDS_tmp24_79b751c949b4 [1,348,741..1,407,380]
chr22_mac1_GDS_tmp25_79b72d0e378c [1,407,381..1,466,022]
chr22_mac1_GDS_tmp26_79b79b35265 [1,466,023..1,524,662]
chr22_mac1_GDS_tmp27_79b717fa32a2 [1,524,663..1,583,304]
chr22_mac1_GDS_tmp28_79b77536717a [1,583,305..1,641,932]
Done (Tue Dec 6 11:55:49 2022).
Output:
chr22_mac1_GDS.gds
Merging:
opening 'chr22_mac1_GDS_tmp01_79b76e0809c' ... [done]
opening 'chr22_mac1_GDS_tmp02_79b743259777' ... [done]
opening 'chr22_mac1_GDS_tmp03_79b739469439' ... [done]
opening 'chr22_mac1_GDS_tmp04_79b773c1248' ... [done]
opening 'chr22_mac1_GDS_tmp05_79b760d4a4e4' ... [done]
opening 'chr22_mac1_GDS_tmp06_79b718ae12bc' ... [done]
opening 'chr22_mac1_GDS_tmp07_79b7250da3bb' ... [done]
opening 'chr22_mac1_GDS_tmp08_79b7393bbfa8' ... [done]
opening 'chr22_mac1_GDS_tmp09_79b76daf5ccb' ... [done]
opening 'chr22_mac1_GDS_tmp10_79b7320d43c1' ... [done]
opening 'chr22_mac1_GDS_tmp11_79b732eec028' ... [done]
opening 'chr22_mac1_GDS_tmp12_79b72971da6b' ... [done]
opening 'chr22_mac1_GDS_tmp13_79b762beae63' ... [done]
opening 'chr22_mac1_GDS_tmp14_79b76832b9ca' ... [done]
opening 'chr22_mac1_GDS_tmp15_79b712c800a5' ... [done]
opening 'chr22_mac1_GDS_tmp16_79b7670dd1a3' ... [done]
opening 'chr22_mac1_GDS_tmp17_79b713ae2e68' ... [done]
opening 'chr22_mac1_GDS_tmp18_79b74ffdc65c' ... [done]
opening 'chr22_mac1_GDS_tmp19_79b749b31d96' ... [done]
opening 'chr22_mac1_GDS_tmp20_79b73144c505' ... [done]
opening 'chr22_mac1_GDS_tmp21_79b743cb1ae1' ... [done]
opening 'chr22_mac1_GDS_tmp22_79b72aa7f3ff' ... [done]
opening 'chr22_mac1_GDS_tmp23_79b77a12170c' ... [done]
opening 'chr22_mac1_GDS_tmp24_79b751c949b4' ... [done]
opening 'chr22_mac1_GDS_tmp25_79b72d0e378c' ... [done]
opening 'chr22_mac1_GDS_tmp26_79b79b35265' ... [done]
opening 'chr22_mac1_GDS_tmp27_79b717fa32a2' ... [done]
opening 'chr22_mac1_GDS_tmp28_79b77536717a' ... [done]
Digests:
sample.id [md5: a761962496b6b317bf251960be9c76b7]
variant.id [md5: 819a750296c70995fba8b9748ceec990]
position [md5: 950041008e64c71f6f9187d2c86da0e0]
chromosome [md5: b78a494dc5be8a12482aaacfa00b65c0]
allele [md5: 495a3512d3c6c197209ad91c86564c2e]
genotype [md5: 507c9f68d3039161f84c086de22588c3]
phase [md5: 13706a839e623a3b95e55afef017faec]
annotation/id [md5: 47b0eafc0f027da5320cfdc0a7efd78d]
annotation/qual [md5: 9d8f45b58e47bd77724a8b8cfde5a0a6]
annotation/filter [md5: 518197a19b03713e21a5fc174926226d]
annotation/info/PR [md5: b63f542998b4e725f47060b84b2cb3e8]
Done.
Tue Dec 6 11:56:56 2022
Optimize the access efficiency ...
Clean up the fragments of GDS file:
open the file 'chr22_mac1_GDS.gds' (114.5M)
# of fragments: 269
save to 'chr22_mac1_GDS.gds.tmp'
rename 'chr22_mac1_GDS.gds.tmp' (114.5M, reduced: 2.5K)
# of fragments: 56
Tue Dec 6 11:56:58 2022
File: /project/ritchie07/personal/daniel/A6K/STAARpipeline/chr22_mac1_GDS.gds
Format Version: v1.0
Reference: unknown
Ploidy: 2
Number of samples: 6,280
Number of variants: 1,641,932
Chromosomes:
Chr22: 1641932
Contigs:
22, 50808250
Alleles:
ALT: <None>
tabulation: 2, 1641932(100.0%)
Annotation, Quality:
Min: NA, 1st Qu: NA, Median: NA, Mean: NaN, 3rd Qu: NA, Max: NA, NA's: 1641932
Annotation, FILTER:
<None>
Annotation, INFO variable(s):
PR, 0, Flag, Provisional reference allele, may not be based on real reference genome
Annotation, FORMAT variable(s):
GT, 1, String, Genotype
Annotation, sample variable(s):
<None>
Hi Daniel,
Thanks for including the output log of generating the GDS files. These GDS files should be the Step 1 input of the FAVORannotator
program. Now given you have run Step 1 and Step 2 of FAVORannotator
successfully, could you please make a copy of these GDS files, and rerun Step 3 of FAVORannotator
on top of this copy?
Please let us know if you encounter this same issue (i.e., The GDS node "apc_protein_function" exists
) again.
Best, Xihao
I just tried re-running Step 3 using the new chr22 GDS file but unfortunately had the same The GDS node "apc_protein_function" exists
issue.
Hi Daniel,
Thanks for letting me know. In this case, could you please paste the output of head(FunctionalAnnotation)
, dim(FunctionalAnnotation)
, and colnames(FunctionalAnnotation)
when running through this line of the Step 3 script?
Best, Xihao
Thanks again for the help -- below are the commands and their outputs:
Hi Daniel,
This is very helpful. You seemed to be using the FAVOR Full Database to annotate the GDS file. However, you should use the FAVOR Essential Database to annotate the GDS file in Step 2 of FAVORannotator.
Hope this helps, and please let me know how it goes. Thank you.
Best, Xihao
Hi Xihao,
Thanks a lot, it seems to be working now. I'll check back if I'm having other issues.
Hi Daniel,
Thanks so much for letting me know.
Best, Xihao
Hi Xihao, we're trying to run STAARpipeline but am running into an issue in "Step 3: Generate the annotated GDS (aGDS) file". Below is the command testing on chromosome 22 with paths changed and the error:
Rscript gds2agds.R 22
If I run the same command again I actually get a different error, and it seems to stay like this (and the runtime also shortened to ~5 seconds from a couple minutes):
Would you know what the problem is? Thanks.
Daniel