Closed martinghunt closed 6 years ago
Also, if a read maps to two different sites, don't just add its counts in. Exclude, store to different array, or random assignment
../quasimap_output_directory/allele_base_counts.json
{
"allele_base_counts": [
[
[0, 0, 0],
[1, 1, 0]
],
[
[0, 0, 1],
[2, 2, 0],
[2, 2, 0, 1, 3]
]
]
}
import json
json_data = json.loads(...)
allele_base_counts = json_data["allele_base_counts"]
variant_site = allele_base_counts[0]
allele = variant_site[0]
base_count_coverage = allele[0]
../quasimap_output_directory/grouped_allele_counts.json
{
"grouped_allele_counts": {
"site_counts": [
{
"0": 10,
"1": 3,
"14": 10
},
{
"3": 30,
"2": 2,
"14": 1
}
],
"allele_groups": {
"0": [0, 2],
"1": [0, 2, 3],
"2": [0, 2, 4],
"3": [2, 5],
"14": [7, 8]
}
}
}
import json
json_data = json.loads(...)
grouped_allele_counts = json_data["grouped_allele_counts"]
variant_sites = grouped_allele_counts["site_counts"]
allele_groups = grouped_allele_counts["allele_groups"]
site = variant_sites[0]
for allele_group_id, count in site.iter():
allele_ids = allele_groups[allele_group_id]
This should be complete. Will leave the issue open for now encase Martin finds any issues.
@martinghunt Do you agree that the new output data structures are correct? If so I'll close this issue.
Looks good. I'll close it now, and can always reopen if I find anything that looks wrong.
Need quasimap to report two things at each variant:
Report the read coverage at every position of every allele. Include multiple mapping, ie if a read maps to two alleles, then increment the read coverage at the relevant positions on both alleles. So the output will be something like a list of ints for each allele (length of each list == length of that allele).
Number of reads mapping to each combination of alleles. Output will be something along the lines of a hash of allele combination -> number of reads. Only include combinations that are actually seen in the reads, meaning do not have keys where values are zero. eg if read0 maps to allele1, read1 maps to allele1, and read2 maps to allele1 and allele2, then the hash would be: {{allele1}: 2, {allele1, allele2}: 1}. In particular, read2 did not contribute to the count of {allele1} or {allele2}, and it is implicit that {allele2}: 0.