scCOVID-19 / COVIDPBMC

Multi-omics profiling of peripheral immune system response to SARS-CoV-2 infection
GNU General Public License v3.0
40 stars 15 forks source link

Patients with two samples #5

Closed DylanMannKrzisnik closed 2 years ago

DylanMannKrzisnik commented 2 years ago

Hello,

Upon investigating the data, I've noticed that 13 of the 130 patients have two samples. Meaning, that two Sample ID's map to the same Patient ID for those 13 patients. This brings the total number of samples to 143, consistent with the number of patients (130) and number of samples (143) reported in the main text of Stephenson et al. (2021).

In Supplementary Table 2, the "Resample" column contains the value "Initial" for most rows, except for the last few rows where the value is set to "Resample". Moreover, for these rows bearing the "Resample" value, the values for "Collection day" range from D7 to D28 whereas the "Collection day" value is D0 for all other rows. These entries suggest pretty clearly that data was resampled for a subset of patients after the initial D0 visit.

Is there any documentation explaining either the criteria and/or process of resampling? Or any other information addressing the mismatch between the number of samples (143) and number of patients (130)? Part of the reason I am inquiring about these matters is to also better understand the batch correction process, to know whether it is patients-based or samples-based (the text suggests the latter, but just want to verify).

Your help is (once again) very much appreciated!

MikeDMorgan commented 2 years ago

Hi @DylanMannKrzisnik - the definition of sample (discrete biological sample) and donor (the patient/control giving the blood sample) is an important distinction. The selection of several individuals at multiple time points was part of a related study that used these samples (Bergmaschi et al, https://www.sciencedirect.com/science/article/pii/S1074761321002168).

The batch correction for figure 1 was, IIRC, sample based. Please note that almost all downstream analyses were restricted to just the samples collected at day 0.

DylanMannKrzisnik commented 2 years ago

I see, it seems then that these biological replicates do not quite have an explicit role for the study of Stephenson et al. (2021). As you say, considering that most analyses were restricted to samples collected at day 0, we could perhaps disregard samples not taken at day 0 and enforce a 1-to-1 match between sample and patient for all patients.

Feel free to point out if I've misunderstood anything. Thank you!