jabrell / eutl_scraper

MIT License
15 stars 8 forks source link

Malformed output in transactions.csv #1

Closed KSilkThread closed 2 years ago

KSilkThread commented 2 years ago

Amazing work on the eutl scraper @jabrell ! I was running the scraper successfully on my local machine and analyzed the output.

Running the scraper twice, I reproduced the same issues. Following are the first two lines (header + payload) of the parsed transactions.csv

transactionDate,transferringRegistry,transferringAccountType,transactionStatus,acquiringAccountType,acquiringAccountIdentifier,transferringAccountName,transferringAccountIdentifier,amount,acquiringRegistry,acquiringAccountIdentifierquiringAccountHolderName,transactionID,acquiringAccountName,transferringAccountHolderName,transactionType
2018-04-30 21:50:06.373,Iceland,100,Completed,100,5016380,Bluebird Cargo,5017996,16402,European Commission,European Commission,EU458292,EU Allowance deletion,Bluebird,10-2

Comparing with the datasets downloaded from the euets.info website, the output file "transactions.csv" seems to have three issues.

  1. The header is missing a letter and a comma (see "acquiringAccountIdentifierquiringAccountHolderName")
  2. The payload and the header of the csv are not aligned, so the header has 16 columns but the payload only 15
  3. Some columns are annotated to wrong values such as, "acquiringAccountIdentifier" and "acquiringAccountName"

Can you confirm this issues in your local artefacts? I was able to reproduce this on two different machines (ubuntu and macOS).

jabrell commented 2 years ago

Hi @KSilkThread, thanks for reporting! I could confirm the issue on my windows system and just corrected it.

Concerning the comparison with the data available on euets.info, I should mention that I previously fetched the data "by hand". The project here serves to make the whole procedure more transparent but likely will not completely replicate the data on euets.info. For the upcoming update in Mai, I will then rely (and further develop) on this project.