ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
121 stars 30 forks source link

WGS数据使用AA软件进行分析--以肺癌细胞系NCI-H524为例 #3671

Closed ixxmu closed 1 year ago

ixxmu commented 1 year ago

https://mp.weixin.qq.com/s/PqZWZ1e_vK5AuY1XIgUD3g

ixxmu commented 1 year ago

WGS数据使用AA软件进行分析--以肺癌细胞系NCI-H524为例 by 东林的扯淡小屋

#下载数据

wget -c https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8670694/SRR8670694fastq-dump --split-files --gzip SRR8670694.sra

#数据过滤

conda activate rnaseq
trim_galore -q 25 --phred33 --length 36 --paired  -j 70 SRR8670694_1.fastq.gz SRR8670694_2.fastq.gz  -o ./clean

conda activate WGS#成功运行:python /data/yudonglin/software/AmpliconArchitect/AmpliconSuite-pipeline/PrepareAA.py -s SRR8670694_sample  -t 60 --cnvkit_dir /home/yudonglin/miniconda3/envs/WGS/bin/cnvkit.py --fastqs SRR8670694_1_val_1.fq.gz SRR8670694_2_val_2.fq.gz --ref hg38 --run_AA

2023-07-23 05:03:14.697320

PrepareAA version 0.1203.13


Running PrepareAA on sample: SRR8670694_sample

Running pipeline on SRR8670694_1_val_1.fq.gz SRR8670694_2_val_2.fq.gz

/data3/yudonglin/A549-3/clean/SRR8670694_sample

Checking for ref index


Performing alignment and sorting

{ bwa mem -K 10000000 -t 60 /data/yudonglin/software/AmpliconArchitect/data_repo/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa SRR8670694_1_val_1.fq.gz SRR8670694_2_val_2.fq.gz | samtools view -Shu - | samtools sort -m 4G -@4 -o /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.bam -; } 2>/data3/yudonglin/A549-3/clean/SRR8670694_sample_aln_stage.stderr


Performing duplicate removal & indexing

samtools rmdup -s /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.bam /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libtinfow.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)


Running samtools index

samtools index /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libtinfow.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)

Removing temp BAM

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libtinfow.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)

samtools: /home/yudonglin/miniconda3/envs/WGS/bin/../lib/libncursesw.so.6: no version information available (required by samtools)

/data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam: 1022810262 + 0 properly paired (88.22% : N/A)

WARNING: BAM FILE PROPERLY PAIRED RATE IS BELOW 95%.

Quality of data may be insufficient for AA analysis. Poorly controlled insert size distribution during sample prep can cause high fractions of read pairs to be marked as discordant during alignment. Artifactual short SVs and long runtimes may occur!



Running CNVKit batch

python3 /home/yudonglin/miniconda3/envs/WGS/bin/cnvkit.py batch -m wgs -r /data/yudonglin/software/AmpliconArchitect/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 60 -d /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/ /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam

CNVkit 0.9.10.dev0

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/GRCh38_cnvkit_filtered_ref.target-tmp.bed with 568558 regions

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/GRCh38_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions

Running 1 samples in 60 processes (that's 60 processes per bam)

Running the CNVkit pipeline on /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam ...

Processing reads in SRR8670694_sample.cs.rmdup.bam

Time: 216.363 seconds (6943300 reads/sec, 2628 bins/sec)

Summary: #bins=568558, #reads=1502274039, mean=2642.2529, min=0.0, max=606025.4705882353 

Percent reads in regions: 129.640 (of 1158804890 mapped)

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.targetcoverage.cnn with 568558 regions

Skip processing SRR8670694_sample.cs.rmdup.bam with empty regions file /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/GRCh38_cnvkit_filtered_ref.antitarget-tmp.bed

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.antitargetcoverage.cnn with 0 regions

Processing target: SRR8670694_sample.cs.rmdup

Keeping 566505 of 568558 bins

Correcting for GC bias...

Processing antitarget: SRR8670694_sample.cs.rmdup

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cnr with 566505 regions

Segmenting /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cnr ...

Segmenting with method 'cbs', significance threshold 1e-06, in 60 processes

Smoothing overshot at 1 / 19505 indices: (-1.321697573382838, 7.849425428882503) vs. original (-1.609781885385645, 7.817937509394677)

Smoothing overshot at 2 / 17667 indices: (-27.049602998688012, 1.9048589458504748) vs. original (-25.1983685108177, 2.1253636039981085)

Smoothing overshot at 11 / 7049 indices: (-26.288755393159832, 2.618149238619435) vs. original (-25.068497981035453, 1.9035427835484244)

Smoothing overshot at 6 / 7464 indices: (-27.171883912648997, 2.934957575643578) vs. original (-24.922008989382192, 2.2795211205607)

Smoothing overshot at 2 / 5172 indices: (-11.364506253155607, 1.1378445459701272) vs. original (-25.257363073869485, 1.038324612692872)

Smoothing overshot at 18 / 1974 indices: (-25.200136795298555, 0.5372737865749664) vs. original (-24.321409203916218, 1.090893821291573)

Smoothing overshot at 51 / 3151 indices: (-25.337007954954007, 0.5936791531093732) vs. original (-24.318305724090017, 1.1472571176966293)

Post-processing /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cns ...

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cns with 828 regions

Applying filter 'ci'

Filtered by 'ci' from 828 to 246 rows

Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.call.cns with 246 regions

Significant hits in 8316/566505 bins (1.47%)

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.bintest.cns with 8316 regions


Running CNVKit segment

python3 /home/yudonglin/miniconda3/envs/WGS/bin/cnvkit.py segment /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cnr  -p 60 -m cbs -o /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cns

Segmenting with method 'cbs', significance threshold 0.0001, in 60 processes

Smoothing overshot at 1 / 19505 indices: (-1.3216986799771169, 7.849427274274742) vs. original (-1.60978, 7.81794)

Smoothing overshot at 2 / 17667 indices: (-27.049587363116196, 1.9048587070968932) vs. original (-25.1984, 2.12536)

Smoothing overshot at 6 / 7464 indices: (-27.171875589582953, 2.934957572862511) vs. original (-24.922, 2.27952)

Smoothing overshot at 2 / 5172 indices: (-11.36452854288826, 1.137845722521361) vs. original (-25.2574, 1.03832)

Smoothing overshot at 11 / 7049 indices: (-26.288746304090807, 2.618151661749489) vs. original (-25.0685, 1.90354)

Smoothing overshot at 18 / 1974 indices: (-25.200134183129684, 0.5372743545007284) vs. original (-24.3214, 1.09089)

Smoothing overshot at 51 / 3151 indices: (-25.337024731855248, 0.5936791794311227) vs. original (-24.3183, 1.14726)

Wrote /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cns with 981 regions


Cleaning up temporary files

rm /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output//*tmp.bed /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output//*.cnn /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output//*target.bed

rm: cannot remove '/data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output//*target.bed': No such file or directory

gzip -f /data3/yudonglin/A549-3/clean/SRR8670694_sample_cnvkit_output/SRR8670694_sample.cs.rmdup.cnr


Running amplified_intervals

python /data/yudonglin/software/AmpliconArchitect/src/amplified_intervals.py --ref GRCh38 --bed /data3/yudonglin/A549-3/clean//SRR8670694_sample.cs.rmdup_CNV_CALLS_pre_filtered.bed --bam /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam --gain 4.5 --cnsize_min 50000 --out /data3/yudonglin/A549-3/clean/SRR8670694_sample_AA_CNV_SEEDS

Global ref name is GRCh38

read length: 96.11004886900893 insert size: 345.02664366910676 insert std dev: 51.59025753467995 max_insert: 499.7974162731466 percent proper: 0.909206083584083 num_sdevs 3

coverage stats (32.645319068895446, 34.472737693749586, 11.478340283610251, 33.526761233375204, 34.13671911103693, 15.171939492282174, 96.11004886900893, 345.02664366910676, 51.59025753467995, 190.2558710650669, 499.7974162731466, 4, 0.909206083584083, 3, 90789268091) 992

pair support 4

python /data/yudonglin/software/AmpliconArchitect/src/AmpliconArchitect.py --ref GRCh38 --downsample 10 --bed /data3/yudonglin/A549-3/clean/SRR8670694_sample_AA_CNV_SEEDS.bed --bam /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /data3/yudonglin/A549-3/clean/SRR8670694_sample_AA_results//SRR8670694_sample

[root:INFO] Commandline: /data/yudonglin/software/AmpliconArchitect/src/AmpliconArchitect.py --ref GRCh38 --downsample 10 --bed /data3/yudonglin/A549-3/clean/SRR8670694_sample_AA_CNV_SEEDS.bed --bam /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /data3/yudonglin/A549-3/clean/SRR8670694_sample_AA_results//SRR8670694_sample 

[root:INFO] AmpliconArchitect version 1.3.r2


[root:INFO] Python version 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]


[root:INFO] #TIME 0.195 Loading libraries and reference annotations for: GRCh38

Global ref name is GRCh38

[root:INFO] #TIME 2.080 Initiating bam_to_breakpoint object for: /data3/yudonglin/A549-3/clean/SRR8670694_sample.cs.rmdup.bam

[root:INFO] #TIME 2.080 Reusing cstats from /data/yudonglin/software/AmpliconArchitect/data_repo/coverage.stats

[root:INFO] #TIME 2.081 Exploring interval: chr3 108875571 109295572

[root:INFO] #TIME 69.616 Calculating coverage meanshift segmentation

[root:INFO] #TIME 153.884 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 168.124 Selecting neighbors

[root:INFO] #TIME 168.124 Exploring interval: chr8 125643827 126518820

[root:INFO] #TIME 535.100 Calculating coverage meanshift segmentation

[root:INFO] #TIME 3988.971 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 18509.240 Selecting neighbors

[root:INFO] #TIME 18646.842 New neighbor: chr3 111445238 111655737 158 (weight=158)

[root:INFO] #TIME 18646.842 New neighbor: chr8 120418251 120658750 1224 (weight=1224)

[root:INFO] #TIME 18646.842 New neighbor: chr8 126990680 127001179 8 (weight=8)

[root:INFO] #TIME 18647.346 New neighbor: chr8 127228812 127918806 1428 (weight=1428)

[root:INFO] #TIME 18647.847 Calculating coverage meanshift segmentation

[root:INFO] #TIME 18706.512 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 19583.759 Selecting neighbors

[root:INFO] #TIME 19584.250 Calculating coverage meanshift segmentation

[root:INFO] #TIME 19596.280 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 19660.941 Selecting neighbors

[root:INFO] #TIME 19683.451 Calculating coverage meanshift segmentation

[root:INFO] #TIME 19683.465 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 19683.963 Selecting neighbors

[root:INFO] #TIME 19683.963 Exploring interval: chr8 127228812 127918806

[root:INFO] #TIME 19685.445 Calculating coverage meanshift segmentation

[root:INFO] #TIME 19708.337 Detecting breakpoint edges (interval neighbors)

[root:INFO] #TIME 34246.139 Selecting neighbors

[root:INFO] #TIME 34307.060 New neighbor: chr3 111445238 111655737 158 (weight=158)

[root:INFO] #TIME 34307.060 New neighbor: chr8 120418251 120658750 1224 (weight=1224)

[root:INFO] #TIME 34307.060 New neighbor: chr8 125543827 126618820 1428 (weight=1428)

[root:INFO] #TIME 34307.060 New neighbor: chr8 126990680 127001179 8 (weight=8)

[root:INFO] #TIME 34307.061 Interval sets for amplicons determined: 

[root:INFO] [amplicon1] chr3:108875571-109315572

[root:INFO] [amplicon2] chr3:111445238-111655737,chr8:120418251-120658750,chr8:125543827-126618820,chr8:126990680-127001179,chr8:127128812-128018806

[root:INFO] #TIME 34307.062 Reconstructing amplicon1

[root:INFO] #TIME 34307.062 Calculating coverage meanshift segmentation

[root:INFO] #TIME 34307.062 Detecting breakpoint edges (interval filter vertices)

[root:INFO] #TIME 34307.063 Building breakpoint graph

[root:INFO] #TIME 34307.114 Optimizing graph copy number flow

Traceback (most recent call last):

  File "/data/yudonglin/software/AmpliconArchitect/src/AmpliconArchitect.py", line 321, in <module>

    bamFileb2b.interval_filter_vertices(ilist, amplicon_name=amplicon_name, runmode=args.runmode)

  File "/data/yudonglin/software/AmpliconArchitect/src/bam_to_breakpoint.py", line 2192, in interval_filter_vertices

    task.putSCeval(opro, oprjo, oprfo, oprgo, oprho)

  File "/data/yudonglin/software/mosek/8/tools/platform/linux64x86/python/3/mosek/__init__.py", line 2181, in putSCeval

    __load_scopt__()

  File "/data/yudonglin/software/mosek/8/tools/platform/linux64x86/python/3/mosek/__init__.py", line 951, in __load_scopt__

    __scopt__ = __library_factory__(os.path.join(__dlldir__,__makelibname('mosekscopt%d_%d' % (mskmajorver.value,mskminorver.value))))

  File "/home/yudonglin/miniconda3/envs/WGS/lib/python3.10/posixpath.py", line 76, in join

    a = os.fspath(a)

TypeError: expected str, bytes or os.PathLike object, not NoneType

Completed


2023-07-23 20:16:21.340538

分析结果,好像没有什么结果hhh。

wget -c https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8670694/SRR8670694fastq-dump --split-files --gzip SRR8670694.sra 
conda activate rnaseq
trim_galore -q 25 --phred33 --length 36 --paired -j 70 SRR8670694_1.fastq.gz SRR8670694_2.fastq.gz -o ./clean

cd cleanbwa mem -q /data/yudonglin/reference/human/hg38/bwa/hg38.fa SRR8670694_1_val_1.fq.gz SRR8670694_2_val_2.fq.gz > SRR8670694.sam -t 60samtools sort -o sorted_SRR8670694.bam SRR8670694.sam -@ 60samtools index sorted_SRR8670694.bam#提取八号染色体的结果145,138,636samtools view -hb -@ 8 -b sorted_SRR8670694.bam chr8:0-145,138,636 > chr8.region.bamsamtools index chr8.region.bam
conda activate rnaseqcd /data3/yudonglin/A549-3/cleansamtools view -hb -@ 8 -b sorted_SRR8670694.bam chr2:20,100,000-20,200,000 > target.region.bamsamtools index target.region.bam#https://www.jianshu.com/p/20d785fc172e
chr3 108875571 109295572 6.017690882354896 /data3/yudonglin/A549-3/clean//SRR8670694_sample.cs.rmdup_CNV_CALLS_pre_filtered.bedchr8 125643827 126518820 223.6389796995142 /data3/yudonglin/A549-3/clean//SRR8670694_sample.cs.rmdup_CNV_CALLS_pre_filtered.bedchr8 127228812 127918806 177.1392370514879 /data3/yudonglin/A549-3/clean//SRR8670694_sample.cs.rmdup_CNV_CALLS_pre_filtered.bed
samtools view -hb -@ 8 -b sorted_SRR8670694.bam chr3:108875571-109295572 > target.region.bamsamtools index target.region.bam



conda activate WGS#成功运行:python /data/yudonglin/software/AmpliconArchitect/AmpliconSuite-pipeline/PrepareAA.py -s SRR8670694_sample -t 60 --cnvkit_dir /home/yudonglin/miniconda3/envs/WGS/bin/cnvkit.py --fastqs SRR8670694_1_val_1.fq.gz SRR8670694_2_val_2.fq.gz --ref hg38 --run_AA