Closed jeromekelleher closed 3 months ago
This is a working example:
{
"Artic_primer_version": ".",
"Collection_date": "2020-01-01",
"Country": "India",
"Date_tree_order": "Date",
"Experiment": "SRX10971427",
"Genbank_N": ".",
"Genbank_accession": ".",
"Genbank_other_runs": ".",
"Genbank_pangolin": ".",
"Genbank_scorpio": ".",
"Genbank_tree_name": ".",
"In_Viridian_tree": "T",
"In_intersection": "F",
"Platform": "ION_TORRENT",
"Region": "Vadodara, Gujarat",
"Sample": "SAMN19315526",
"Study": "PRJNA625669",
"Viridian_N": 3,
"Viridian_amplicon_scheme": "COVID-AMPLISEQ-V1",
"Viridian_cons_het": 0,
"Viridian_cons_len": 29832,
"Viridian_pangolin": "B.1.36.29",
"Viridian_result": "PASS",
"Viridian_scorpio": ".",
"date": "2020-01-01",
"date_submitted": "2021-05-25",
"sc2ts": {
"mutations": [
"222C>T",
"241C>T",
"3037C>T",
"3267C>T",
"4683C>T",
"5986C>T",
"6471C>T",
"9870C>T",
"10486A>G",
"14408C>T",
"14652T>C",
"18877C>T",
"21034C>T",
"22444C>T",
"22627G>A",
"22882T>G",
"23403A>G",
"25563G>T",
"26173G>T",
"26735C>T",
"27147G>T",
"28183G>T",
"28277T>C",
"28854C>T"
],
"path": [
"parent=1:left=0-right=29904"
],
"qc": {
"num_masked_sites": 130,
"original_base_composition": {
"A": 8894,
"C": 5466,
"G": 5850,
"N": 106,
"T": 9587
},
"original_md5": "587f101275d647fd492027b0855e893f"
}
},
"strain": "SRR14631544"
}
We should also include whether a mutation is a reversion or immediate reversion, I think.
Also, there is probably a prettier way to show the copying paths than the format "parent=1:left=0-right=29904"
.
I'd use a different format for the path all right, that would be awkward. Would probably be as well to store the mutations as objects, let's not worry about the storage too much.
Do you mean convert the MatchMutation
and PathSegment
objects to some form that the metadata encoder is okay with? I am trying this string repr. approach because I couldn't just pass in a list of MatchMutation
and PathSegment
objects to metadata JSON.
If you add an asdict function to them you should be able to output to json easily. Look for other asdict implementations
It looks like this now.
"sc2ts": {
"mutations": [
{
"derived_state": "T",
"inherited_state": "C",
"is_immediate_reversion": false,
"is_reversion": false,
"site_id": 144,
"site_position": 222
},
...
{
"derived_state": "T",
"inherited_state": "C",
"is_immediate_reversion": false,
"is_reversion": false,
"site_id": 28472,
"site_position": 28854
}
],
"path": [
{
"left": 0,
"parent": 1,
"right": 29904
}
],
I thought I had to modify a test in test_inference.py
, but the tests ran fine without it.
Keep the list of mutations and original copying node for every sample as qc metadata. It gets too hard reasoning about this after the fact, when layers of tree building are place on top.