move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
260 stars 132 forks source link

[Bug] NGPVAN download_saved_list() only returning partial results #988

Closed ghost closed 6 months ago

ghost commented 7 months ago

When retrieving a saved list from van using the download_saved_list(), only a partial list of VanIDs is returned.

Detailed Description

We have a saved list with 25 records in VAN. When using the Parsons download_saved_list() function and returning to a dataframe or csv, not all vanIDs are returned. Only 18 out of the 25 records are being returned.

To Reproduce

This is the query used when the bug occurs.

### API KEY SETUP

### IMPORT PACKAEGS
import pandas as pd

#read api key file and assign variables
df = pd.read_csv('/key.csv')
parsons_api_key = df.loc[df['API'] == 'parsons']['KEY'].iloc[0]   

from parsons import VAN
van = VAN(api_key= parsons_api_key, db='MyVoters') 

##### CONFIG

config_vars = {
    # VAN
    "VAN_API_KEY": parsons_api_key,
    "VAN_DB_NAME": "MyVoters",  
    "VAN_FOLDER_ID": "1234", 
    "VAN_SAVED_LIST_ID": " 9876954"
}

#### CODE

import os  
from parsons import VAN  
from parsons import logger  

download_saved_list = van.download_saved_list("1073953").to_dataframe().sort_values(by=['VanID'])

saved_list_df = pd.DataFrame.from_dict(download_saved_list)

saved_list_df

Your Environment

Additional Context

Add any other context about the problem here.

Priority

Please indicate whether fixing this bug is high, medium, or low priority for you. If the issue is time-sensitive for you, please let us know when you need it addressed by.

Medium

matthewkrausse commented 6 months ago

I am able to reproduce this. When I run the get_saved_list method, it returns some meta data

{'description': None, 'listCount': 4994, 'doorCount': 3667, 'isSuppressed': None, 'savedListId': 722732, 'name': 'List Name'}

But when I use download_saved_list, I get 4866.

the source code of the method looks clean. It just creates the job and downloads the csv and converts it to a table.

def download_saved_list(self, saved_list_id):
        """
        Download the vanids associated with a saved list.

        `Args:`
            saved_list_id: int
                The saved list id.
        `Returns:`
            Parsons Table
                See :ref:`parsons-table` for output options.
        """

        ej = ExportJobs(self.connection)
        job = ej.export_job_create(saved_list_id)

        if isinstance(job, tuple):
            return job
        else:
            return Table.from_csv(job['downloadUrl'])

I know that VAN always has weird suppressions going on. I imagine that is what is happening here. Anybody have any thoughts?

matthewkrausse commented 6 months ago

Also when I export the list from VAN using the UI, I get 4881 records...

ghost commented 6 months ago

Thanks for reviewing @matthewkrausse! If the list is already created, without suppressions applied, would the API still suppress data in the list pull?

matthewkrausse commented 6 months ago

I'm not sure. I would suggest reaching out to van support with the question because this is on their end. We aren't doing anything to the result, just downloading the csv returned to a Parsons table.

ghost commented 6 months ago

Thanks! I reached out to VAN support and it was indeed the API settings. It was not configured to return records with a status of registrants or dropped. This default behavior has been updated so that it does now return those in a saved list.