Closed HLHsieh closed 4 months ago
Hi
can you try
cd strspy/setup
bash STRspy_setup.sh
the setup dir has the "environment.yml"
hope this help.
Hi,
Thank you! That worked and the test data worked smoothly as well. I have some questions about the input files and output.
What distinguishes testset/all_regions/test.regions.sort.named.bed from testset/testCustomDB/FGA.bed or vWA.bed?
I'm curious about the meaning of the RawCounts in the FGA_cA_15-0.5-1.minimap.sorted.bam_Allelefreqs.txt file. For instance, does the value "92" indicate the presence of sequences like FGA[GGAA]2GGAG[AAAG]13_AGAAAAAA[GAAA]3_21 or sequences containing "AAAG"? I may have misunderstood this.
STR RawCounts NormalizedCounts
FGA_[GGAA]2_GGAG_[AAAG]13_AGAA_AAAA_[GAAA]3_21 92 1
FGA_[GGAA]2_GGAG_[AAAG]15_AGAA_AAAA_[GAAA]3_23 74 0.804348
FGA_[GGAA]2_GGAG_[AAAG]14_AGAA_AAAA_[GAAA]3_22 40 0.434783
FGA_[GGAA]2_GGAG_AAAG_AAG_[AAAG]13_AGAA_AAAA_[GAAA]3_22.3 26 0.282609
FGA_[GGAA]2_GGAG_[AAAG]14_AA_AAAA_[GAAA]3_21.2 26 0.282609
FGA_[GGAA]2_GGAG_[AAAG]12_AGAA_AAAA_[GAAA]3_20 22 0.23913
FGA_[GGAA]2_GGAG_[AAAG]15_AA_AAAA_[GAAA]3_22.2 10 0.108696
FGA_[GGAA]2_GGAG_[AAAG]11_AG_[AAAG]4_AGAA_AAAA_[GAAA]3_23.2 10 0.108696
FGA_[GGAA]2_GGAG_[AAAG]10_AGAA_AAAA_[GAAA]3_18 8 0.0869565
FGA_[GGAA]2_GGAG_[AAAG]5_AAGG_[AAAG]9_AGAA_AAAA_[GAAA]3_23 6 0.0652174
FGA_[GGAA]2_GGAG_[AAAG]17_AGAA_AAAA_[GAAA]3_25 6 0.0652174
FGA_[GGAA]2_GGAG_[AAAG]13_AGAA_AAAA_GAAA_AAAA_GAAA_21 6 0.0652174
FGA_[GGAA]2_GGAG_[AAAG]11_AGAA_AAAA_[GAAA]3_19 6 0.0652174
FGA_[GGAA]2_GGAG_[AAAG]9__AA_AAAA_[GAAA]3_16.2 4 0.0434783
FGA_[GGAA]2_GGAG_[AAAG]18_AA_AAAA_[GAAA]3_25.2 4 0.0434783
FGA_[GGAA]2_GGAG_[AAAG]16_AGAA_AAAA_[GAAA]3_24 4 0.0434783
FGA_[GGAA]2_GGAG_[AAAG]11_AA_AAAA_[GAAA]3_18.2 4 0.0434783
FGA_[GGAA]4_GGAG_[AAAG]3_[GAAG]3_[AAAG]15_AA_AAAA_[GAAA]4_31.2 2 0.0217391
FGA_[GGAA]2_GGAG_[AAAG]5_AAGG_[AAAG]12_AGAA_AAAA_[GAAA]3_26 2 0.0217391
FGA_[GGAA]2_GGAG_[AAAG]20_AGAA_AAAA_[GAAA]3_28 2 0.0217391
FGA_[GGAA]2_GGAG_[AAAG]16_AA_AAAA_[GAAA]3_23.2 2 0.0217391
Best, Hsin
I am glad that it worked for you. With reference to your questions, I encourage you to read our article (https://www.sciencedirect.com/science/article/abs/pii/S1872497321001654).
What distinguishes testset/all_regions/test.regions.sort.named.bed from testset/testCustomDB/FGA.bed or vWA.bed? The file "test.regions.sort.named.bed" contains all regions from the STR bed files such as FGA.bed and vWA.bed that you want to verify for presence in your sample. It is used to provide overall coverage information in the STRspy output. Look for directory called "GenomicMappingStats".
The example shows the main output of STRpy. Raw counts indicate the coverage of corresponding sequences (STR repeats), such as FGA_[GGAA]2GGAG[AAAG]13_AGAAAAAA[GAAA]3_21 = 92.
In, this example highlights the top two repeats (FGA_[GGAA]2GGAG[AAAG]13_AGAAAAAA[GAAA]321 and FGA[GGAA]2GGAG[AAAG]15_AGAAAAAA[GAAA]3_23) for the sample, which are likely the true genotype of the sample, indicating heterozygosity.
I hope it helps !
-best Rupesh
Hi there,
I was trying to install this tool as follow:
However, I got the following error:
I did not see the environment.yml under strspy which was cloned from your repo.
Any advice on this matter would be appreciated.
Best, Hsin