souksili / Event-locator

A web application to locate and discover events.
MIT License
0 stars 0 forks source link

Bug in Data Format Structure When Adding PredictHQ Data to Google Sheets - Some Fields Returning Undefined in table.html and Map #22

Closed massidi closed 2 hours ago

massidi commented 2 hours ago

Description:
When integrating data from different sources such as PredictHQ into Google Sheets, we encountered an issue where some fields were returning "undefined" errors in table.html and on the map. This bug particularly affected fields like Titre de lévénement, Adresse de lévénement and Description de lévénement . Additionally, some addresses contained commas within the text, which were misinterpreted as new columns, causing data misalignment. Another issue occurred when the OpenStreetMap API was unable to find some addresses, resulting in errors for missing coordinates.


Steps to Reproduce:

  1. Fetch event data from PredictHQ and load it into Google Sheets.
  2. Display this data in table.html or use it on the map.
  3. Some fields like address or date return "undefined".
  4. Misformatted CSV data (commas in address fields) leads to errors in column alignment.
  5. OpenStreetMap API fails to retrieve coordinates for some addresses, leading to missing data on the map.

Root Cause:


massidi commented 2 hours ago

Solution:
The bug was resolved in this commit in the dev_red branch and the script update iscsv.js by matching the data structure to fit the Google Sheets format and implementing the following fixes:

  1. Manually: Handled the issue where commas inside the column were treated as new columns. We now remove commas from inside quotes before parsing, ensuring that addresses, title and description are correctly processed in a google sheet

  2. Column Count Check: Added a check in the script to ensure that each row has at least 6 columns. If the row doesn’t meet the criteria, it logs an error and skips processing the row.

    
      // Process each row and extract columns
        const events = await Promise.all(rows.map(async (row) => {
    
            const cols = row.split(',').map(col => col.trim());
    
            // Make sure there are enough columns
            if (cols.length < 6) {
                console.error('Invalid row format:', row);
                return null; // Skip this row if it's invalid
            }
3. **Date Handling:** Added a check to ensure valid date formatting. If the date is missing or incorrectly formatted, it logs a warning and skips the conversion.

// Extract and format the date (assuming date is in column 5) let formattedDate = null; if (cols[5]) { const [month, day, year] = cols[5].split('/'); if (month && day && year) { formattedDate = ${year}-${month}-${day}; } else { console.warn('Invalid date format:', cols[5]); } } else { console.warn('Missing date in row:', row); }


4. **Coordinate Fetching:** with the new data of PredictHQ API I added two new columns (`latitude ,longitude`) If latitude or longitude is missing, the script now fetches the missing coordinates using the column(`Adresse de lévénement`)  from the OpenStreetMap API. If coordinates are available from the data, it uses the existing ones.

// Create event object const event = { title: cols[1] || 'No title', // Title (column 1) address: cols[2] || 'No address', // Address (column 2) description: cols[3] || 'No description', // Description (column 3) category: cols[4] || 'No category', // Category (column 4) date: formattedDate || 'No date', // Date (converted format) lat: cols[6] || null, // Latitude (column 6) lng: cols[7] || null // Longitude (column 7) };

        // Check if lat and lng are null or empty, then fetch coordinates
        if (!event.lat || !event.lng) {
            // Fetch latitude and longitude for the address using the geocoding function
            const coords = await geocodeAddress(event.address);

            if (coords) {
                event.lat = coords.lat;
                event.lng = coords.lng;
            }
        }


6. **Default Values:** For missing fields like title, address, or description, we implemented default values such as `'No title'`, `'No address'`, etc., to handle missing data gracefully.

---

**Why:**  
The issues stemmed from the fact that the data returned by the PredictHQ API uses commas as delimiters, and some fields (like description and address...) contained commas inside the text. This caused incorrect splitting of columns. Additionally, some addresses weren’t recognized by the OpenStreetMap API, leading to missing coordinates on the map. The preprocessing steps, validation checks, and API integration solved these issues.

---

**Script Updates (csv.js):**
- **Column Count Check:** Ensures that rows have the required number of columns.
- **Date Handling:** Adds checks for valid date formats and handles missing or malformed dates.
- **Default Values:** Uses fallback values for missing fields to prevent `"undefined"` errors.
- **Filter Invalid Rows:** Skips rows with missing or invalid data and filters out null values after processing.