jbloom / SARS2_28Dec2019_Genbank_submission

analysis of SARS-CoV-2 sequence submitted to Genbank on Dec-28-2019
12 stars 0 forks source link

Resolving the discrepency #1

Closed zach-hensel closed 10 months ago

zach-hensel commented 10 months ago

I suggest considering the possibilities that:

  1. IPBCAMS-WH-01 results from analysis of data used to construct this assembly, plus additional data described in Ren et al (RACE and Sanger). Your own analysis of the SARS2 mutational spectrum does not support the likelihood of two A>G and one T>A before any C>T. The mutations are in both sequences because they come from the same data.
  2. Onset dates can change upon investigation of medical records e.g. utilizing collection or hospitalization date as an estimate until additional data is collected. Ren et al was submitted in January 2020 and the patient in question was not transferred to the hospital where other patients' samples were collected (reported in Ren et al, Shen et al, and elsewhere).

A good overview of events surrounding early diagnosis, sample collection, false negative tests, and sequencing can be found in this oral history published in March 2020 - https://mp.weixin.qq.com/s/WQwuTuGvKB82R5gfAUVY4A - I think that this is worth reading closely and considering carefully. Particularly of note is that samples were split between institutions for analysis and two patients declined to give consent for sampling.

Edit: Another point about this oral history is that it plainly describes receiving sequencing results on December 27, 2019 and was published in March 2020. So, if "An expert evaluation team from the NHC initially identified a new coronavirus as the cause of the epidemic" is falsified by knowing patient samples were sequenced and sequences contained a novel coronavirus, the timeline published at Xinhuanet was already falsified the month before in an article published by the main state media organization in China. However, that interpretation of this sentence is falsified many times over by all sorts of data prior to January 8, 2020.

jbloom commented 10 months ago

I agree that there is a discrepancy between the IPBCAMS-WH-01 sequence and the joint WHO-China report on the mutations in the sequence. But given that the joint WHO-China report does not provide any details of its analysis and the raw data are not (as best as I can tell) available, then I think the correct thing is to do what I have done and note there is a discrepancy between the IPBCAMS-WH-01 sequence and the joint WHO-China report.

In the absence of any raw data or actual description of an analysis in the joint WHO-China report, I don't think it is supported to confidently assert that all three reported mutations are necessarily sequencing errors, although I do think it should be said that some or all could be sequencing errors (as my current analysis does).