markwk / qs_ledger

Quantified Self Personal Data Aggregator and Data Analysis
MIT License
978 stars 197 forks source link

I wonder if there is a bug apple-health-data-parser.py ? #7

Closed AEDWIP closed 4 years ago

AEDWIP commented 4 years ago

qs_ledger is really helpful. It would be hard for me to use the XML version

I wonder if I found a bug in apple-health-data-parser.py. I noticed the HeartRate.csv that the head had 8 field, how the data had 14. tools like unix cut fail to parse

I looked at the XML I think this is what a heart rate record looks like. It looks like apple include ',' in the value string. Once I figure out this out I was able to work around it

<Record type="HKQuantityTypeIdentifierHeartRate" sourceName="andrew e.’s Apple Watch" sourceVersion="5.1.3" device="&lt;&lt;HKDevice: 0x281e21c20&gt;, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch4,4, software:5.1.3&gt;" unit="count/min" creationDate="2019-07-29 16:13:30 -0800" startDate="2019-07-29 16:10:29 -0800" endDate="2019-07-29 16:10:29 -0800" value="53">

This is the Header from the HeartRate.csv file

sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value

this is a record from HeartRate.csv . I broke it up to figure what was going on

"andrew e.’s Apple Watch",
"5.1.3",
"<<HKDevice: 0x281e36490>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch4,4, software:5.1.3>",
"HeartRate",
"count/min",
2019-07-29 15:28:48 -0800,
2019-07-29 15:26:47 -0800,
2019-07-29 15:26:47 -0800,
62

Andy

markwk commented 4 years ago

It's possible there might be a bug there. Parsers are notoriously fragile. Could be parse or even a change with Apple export too. Can you please share more on your setup? Mac? Linux? Versions? Etc.

AEDWIP commented 4 years ago

Hi Markwk

I am using MacOS mojave 10.14.6, Python 3.7.5

Interesting. I was able to create a Pandas DataFrame from the CSV file generated by apple-health-data-parser.py without any problem

I think this is the field in the csv file that causes problems when I tried to work with the csv files using standard unix utilities. I am not familiar with the< > syntactic. "<<HKDevice: 0x281d49bd0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch4,4, software:5.3>",

df = pd.read_csv( dataPath + "/HeartRateVariabilitySDNN.csv")
print(df.columns)
df.head()

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')
sourceName sourceVersion device type unit creationDate startDate endDate value
andrew e.’s Apple Watch 5.3 <<HKDevice: 0x281d49bd0>, name:Apple Watch, ma... HeartRateVariabilitySDNN ms 2019-07-30 19:37:36 -0800 2019-07-30 19:36:31 -0800 2019-07-30 19:37:36 -0800 20.2670
andrew e.’s Apple Watch 5.3 <<HKDevice: 0x281d49c70>, name:Apple Watch, ma... HeartRateVariabilitySDNN ms 2019-07-31 19:53:42 -0800 2019-07-31 19:52:36 -0800 2019-07-31 19:53:41 -0800 20.4298

andrew e.’s

Given I can load the data with Pandas This is not a problem for me right now.

Kind regards

Andy

markwk commented 4 years ago

Ok. Glad it works for you, Andy. Gonna close this since it sounds like it is not specific to python or pandas stuff.