openaq / openaq-fetch

A tool to collect data for OpenAQ platform.
MIT License
85 stars 39 forks source link

Delhi Pollution Control Committee: National Capital Territory (NCT) #1088

Closed majesticio closed 7 months ago

majesticio commented 8 months ago

This adapter adds 24 stations from Delhi (NCT). Data is retrieved using puppeteer, and can take anywhere from 1 to 3.5 minutes to fetch the data. A deployment has been added so this adapter will run on its own and not with the realtime deployment. This adapter fetches these parameters from each station: co, no2, o3 , so2, pm25 in µg/m³, and relative humidity in %. Stations are fetched dynamically.

github-actions[bot] commented 8 months ago

results:
⛔⛔⛔⛔⛔⛔
The following packages are missing:
- puppeteer is needed by /home/runner/work/openaq-fetch/openaq-fetch/src/adapters/delhi.js

majesticio commented 8 months ago

I have updated the adapter to use the latest version of puppeteer as requested. The code now reflects recent changes to the puppeteer API.

majesticio commented 8 months ago

Good catch, it should now have "puppeteer": "^22.5.0"

russbiggs commented 8 months ago

My two cents without looking at the code, but since i did the initial research on this source: Puppeteer seems like a really heavyweight solution for this source and is likely the source of the slowness. Since the data are availalbe inline in the page, there shouldnt be any need to use a headless browser and wait for all the JS and rendering. Simply fetching the page and finding the data arrays in the page content e.g.

}, {*/
  name: 'Ammonia',
   data: [84.6,88.9,104.7,100.2,97.5,100.7,99.6,94.9,83.6,61.1,53.0,42.5,37.6,34.2,36.3,45.0,55.0,54.0,52.6,59.9,59.2,62.1,58.4]
   }]

With a lightweight parser will likely be much faster and simpler.

majesticio commented 8 months ago

I updated the package-lock, I am hoping some mismatch was causing issues as it seems to run fine on my end. You have both noted this might not be the best way to fetch from this provider and I will review it going forward. @caparker Highcharts is defined within the script with puppeteer and should resolve itself if the version is correct... ie if puppeteer is not working properly it won't find / define Highcharts

caparker commented 8 months ago

I did some refactoring and after playing with it for a while I think I would agree that we dont need puppeteer.

For example, this url https://www.dpccairdata.com/dpccairdata/display/AallAdvanceSearchCconc.php?stName=TWFuZGlybWFyZw== with this payload (and you could probably through the station name in there as well) parameters: NH3 fDate: 2024-03-24 15:01 eDate: 2024-03-25 15:01 duration: 1 graphType: Line submit: Search

Will return text that includes this

        $('#container').highcharts({
            chart: {
                type: 'line'
            },
            title: {
                text: ''
            },
            subtitle: {
                text: ''
            },
            xAxis: {
                categories: ['2024-03-24 19:00:00','2024-03-24 23:00:00','2024-03-25 01:00:00','2024-03-25 04:00:00','2024-03-25 05:00:00','2024-03-25 09:00:00','2024-03-25 11:00:00','2024-03-25 12:00:00','2024-03-25 14:00:00','2024-03-25 15:00:00'],
                labels: {
                    rotation: 90
                }
            },
            yAxis: {
                title: {
                    text: ''
                }
            },
            tooltip: {
                enabled: true,
                formatter: function() {
                    return '<b>'+ this.series.name +'</b><br/>Date & Time : '+
                        this.x +'<br/><br/>Value : '+ this.y +'';
                }
            },
            plotOptions: {
                line: {
                    dataLabels: {
                        enabled: true,
                        style: {
                            textShadow: '0 0 3px white, 0 0 3px white'
                        }
                    },
                    enableMouseTracking: false
                }
            },
            series: [{
                /*name: 'Tokyo',
                data: [7.0, 6.9, 9.5, 14.5, 18.4, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
            }, {*/
                name: 'Ammonia',
                data: [41.2,40.5,41.3,39.2,39.3,41.2,41.8,42.4,41.7,41.7]               
            }]
        });
    });

And if we can extract that json part it would actually be safer because of the timestamps that are included in the object.

majesticio commented 7 months ago

The previous commit 613595b0b2a8d6183fe86ed20968e66451b52201 was returning more data for me. This was functioning on my end, however the adapter is slow and concerns have been expressed about its implementation. Closing, but may be of use for someone who can do it better.