virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

AA only processed data on chr 1, and got a warning about Font family #106

Open zhangheng43 opened 3 years ago

zhangheng43 commented 3 years ago

Dear author,

I'm running AmpliconArchitect on my WGS data. It ran completely but got a warning, and the result only showed one Amplicon on chr1:

[root:INFO] Commandline: /home/heng/ZhangHeng/AmpliconArchitect/src/AmpliconArchitect.py --ref GRCh38 --downsample 10 --bed /media/heng/D215server/Data/ZH_80-614826108_WGS/paaOut/PLC1_AA_CNV_SEEDS.bed --bam /media/heng/D215server/Data/ZH_80-614826108_WGS/Out/PLC1.cs.rmdup.bam --out /media/heng/D215server/Data/ZH_80-614826108_WGS/paaOut//PLC1_AA_results/PLC1
[root:INFO] AmpliconArchitect version 1.2 [root:INFO] #TIME 5.045 Loading libraries and reference annotations for: GRCh38 Global ref name is GRCh38 [root:INFO] #TIME 12.187 Initiating bam_to_breakpoint object for: /media/heng/D215server/Data/ZH_80-614826108_WGS/Out/PLC1.cs.rmdup.bam [root:INFO] #TIME 12.188 Exploring interval: chr1 234347815 234867766 [root:INFO] #TIME 80.171 Searching new neighbors for interval: chr1 234347815 234887766 [root:INFO] #TIME 80.690 Calculating coverage meanshift segmentation [root:INFO] #TIME 189.629 Detecting breakpoint edges [root:INFO] #TIME 327.962 Selecting neighbors [root:INFO] #TIME 383.094 Interval sets for amplicons determined: [root:INFO] [amplicon1] chr1:234347815-234887766 [root:INFO] #TIME 383.096 Reconstructing amplicon1 [root:INFO] #TIME 383.096 Calculating coverage meanshift segmentation [root:INFO] #TIME 383.096 Detecting breakpoint edges [root:INFO] #TIME 383.345 Building breakpoint graph [root:INFO] #TIME 383.594 Optimizing graph copy number flow [root:INFO] #TIME 407.552 Plotting SV View [root:INFO] #TIME 407.559 Plotting SV View for amplicon1 /usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:1297: UserWarning: findfont: Font family [u'sans-serif'] not found. Falling back to DejaVu Sans (prop.get_family(), self.defaultFamily[fontext])) [root:INFO] #TIME 509.674 Total Runtime Completed

It seems that the other chromosomes have not been analyzed. I want to ask if this result is normal? How to solve the warning?

Best regards, Heng

jluebeck commented 3 years ago

Hi Heng,

The warning about font is harmless and does not affect results.

Regarding the chr1 amplicon. AA uses copy number seed regions which are identified to have significant amplifications prior to running AA (contained in the input bed file). These regions are examined by AA and connected regions are brought in to the amplicon if connected through discordant read pairs. AA is designed to study focal amplifications, and by definition will not study entire genomes. It only examines regions where candidate amplifications exist as detected by external CNV calls. Are there other focal amplification regions you were expecting to see?

Best, Jens