red-4 / curious-moon

The Repository for the book, A Curious Moon, which you can purchase from https://bigmachine.io.
111 stars 17 forks source link

cda.csv row count discrepancy #69

Open jjturner opened 2 years ago

jjturner commented 2 years ago

cda.csv initially downloaded when obtaining the book and result of my COPY command:

COPY operation as illustrated in the book:

robconery commented 2 years ago

I think something went off the rails a little bit with 1.1. I might have to revert.

greywidget commented 2 years ago

I agree with the above counts (I also get 440510) but think this is more than just a row count discrepancy.

If you follow links in the pdf: Ring Dust | Calculating Cassini's Speed and then page down 3 pages you will see some SQL:

 select time_stamp,
  x_velocity,
  y_velocity,
  z_velocity,
sqrt(
  (x_velocity * x_velocity) + 
  (y_velocity * y_velocity) + 
  (z_velocity * z_velocity)
)::numeric(10,2) as v_kms
from cda.impacts
where x_velocity <> -99.99;

Which produces data show on the next page of the pdf, for which the first two rows have the following timestamps:

I don't believe I have this data in my file. I've downloaded the data several times by following the Archives for the Cassini Mission link at the red4 archive both on Windows and Mac. And I've unzipped it with several utilities.

The cda.csv file has 440510 data rows and that is the count that ends up in my import.cda and cda.impacts file.

By my calculation, a timestamp with a date of 2005-04-04 should have an impact_event_time in cda.csv that begins with 2005-094 but there is no such text in cda.csv.

The earliest data I can find in my downloaded cda.csv is for 2005-01-01 and if I run the following SQL:

with t1 as (
select time_stamp,
x_velocity, y_velocity, z_velocity,
sqrt(
  (x_velocity * x_velocity) +
  (y_velocity * y_velocity) +
  (z_velocity * z_velocity)
)::numeric(10, 2) as v_kms
from cda.impacts
where x_velocity <> -99.99
  )
select * from t1
order by time_stamp;

I get data that exactly matches that shown in closed issue #43

I wonder it someone else coud check the download of cda.csv and confirm/deny the presence of data for 2005-04-04? I would like to be able to get data to match that shown in the pdf it I am to carry on with the rest of the tutorial.

robconery commented 2 years ago

I accessed my own data archives and can confirm that I have the same count as the both of you for the CDA csv file. I remember when I was preparing the downloads I was worried about file sizes so I was going to trim columns and records that weren't needed (the CDA data is gigantic) which evidently was in the first release. The second, however, appears to have more records in it.

I'm still trying to figure out what's going on and I will! I normally leave myself exhaustive notes about the choices I made but I can't seem to locate anything for the CDA - mostly because I use the INMS data for the rest of the book.

To be clear: the choice was gigs and gigs of CDA data that we then pare down, or me just clipping and dropping what we need... not an easy choice and now we can see why :).

Stay tuned...

greywidget commented 2 years ago

Thanks @robconery appreciate you looking at this.

Yeah I can see that the CDA extract process changed over time, which is a good thing! I don't really want to be pulling down all that RAW data :-)

nice one.