Open pawanchk opened 1 year ago
Hi Pawan,
The coverage.stats file reports the following values in each line (list represents each column from left to right)
bam
: path of the bam file
wc_10000_median
: median coverage in sampled windows of size 10000bp
wc_10000_avg
: average coverage in sampled windows of size 10000bp
wc_10000_std
: std. dev of coverage in sampled windows if size 10000bp
wc_300_median
: median coverage in sampled windows of size 300bp
wc_300_avg
: average coverage in sampled windows of size 300bp
wc_300_std
: std. dev of coverage in sampled windows if size 300bp
read_length
: read length in bam file
insert_size
: average insert size
insert_std
: insert size standard deviation
min_insert
: minimum detected insert size
max_insert
: maximum detected insert size
pair_support
: number of read pairs of support required to call a discordant edge (SV)
percent_proper
: estimated fraction of properly-paired reads
num_sdevs
: number of standard deviations used for insert-size boundaries (this is a user-specifiable AA parameter)
bamfile_filesize
: filesize of bam file (to help ensure the bam file matches its stats when re-using these statistics for re-runs).
The bam
, pair_support
and num_sdevs
will be reported in the log files in a human-readable way.
It's important to note that these values are quickly computed estimates designed to help guide AA, and they are not exhaustive or exact summaries of those properties in the bam file. For more exact profiling of your bam file you may want to instead use samtools, or similar.
Thanks, Jens
Hi Jens,
Thanks for your response with the details, this is very helpful.
These are some follow-up queries that I have -
bam
, pair_support
and num_sdevs
?Hi Pawan,
The main [sample].log
file produced by AA will report on various lines a line of text that says pair support 2
(or similar) and the both bam file and insert sdevs will be reported there as well in one of the first lines that shows the command that was run for AA. If insert sdevs is not reported, then it is using the default 3. You may of course find that if you really need this information that the coverage.stats file will be easier to extract it from.
No, they are only reported in coverage.stats. They may also be printed to stdout when amplified_intervals.py
is run (assuming you are using AmpliconSuite-pipeline. These values certainly are important for AA to properly estimate bulk copy numbers using coverage, and also to identify SVs. AA would not run without having this information.
The best way to regenerate this data is to run AmpliconSuite-pipleline on your sample again (up to the point of generating seed regions - no need to set --run_AA
, --run_AC
).
Thanks, Jens
Hi,
I have two queries regarding the
coverage.stats
file that gets updated in thedata_repo
folder everytime we run AmpliconArchitect for a particular sample -coverage.stats
file also found in the AmpliconArchitect results folder for a particular sample (bam file) executed using AmpliconArchitect ?Any advice would be helpful.
Best Regards, Pawan