sobrus / FastLacellsGenerator

Fast lacells.db database generator script for LocalGSMBackend by n76
GNU General Public License v3.0
44 stars 10 forks source link

dealing with CSV files #19

Open Nichtraucher opened 1 year ago

Nichtraucher commented 1 year ago

Hello there,

I'd like to use Local NLP Backend but it requires imported data as a CSV-file instead of a database-file. I can download the opencellid dataset as a CSV-file directly from their website but I can't unselect the useless 3G data and I can't edit the file manually as the spreadsheet editor complains about the oversize. :-/

I noticed one can sign up for a free geolocation API at unwired. Could FastLacellsGenerator be amended to access this data? I have no idea how they provide their datasets...

cheers

IzzySoft commented 1 year ago

You can define via the config which data to accept:

RADIO="GSM|UMTS|LTE"

Would excluding UMTS help in your case? Then the resulting database would just have GSM and LTE cells.

Not sure which API exactly you mean, but according to this function, some unwired server is already used to obtain the OCI data. And via the config, you can filter that pretty well.

Nichtraucher commented 1 year ago

Would excluding UMTS help in your case? You can define via the config which data to accept:

I know how to do that, but that's not the point. The above-mentioned backend requires the data as a CSV-file and not as a database-file. lacells-creator does create a CSV-file, but it doesn't support choosing the network type and the same goes for downloading the data set directly from the opencellid website. Deleting unnecessary network type fields in a CSV file manually isn't possible because spreadsheet editors can't handle spreadsheets of this size. Can FastLacellsGenerator be amended to let users choose between creating a CSV-file and a database-file?

Not sure which API exactly you mean some unwired server is already used to obtain the OCI data

Unwired labs offers two geolocation products/API's with different datasets. The OpenCellid API gives access to data that is somehow community sourced, whereas the UnwiredLabs API provides access to a proprietary dataset. As far as I understand, the limitations for end-users are the same as those for the OpenCellid API (I'm not sure though. Their website lacks some information). Both datasets are probably loaded from the same server.

IzzySoft commented 1 year ago

Can FastLacellsGenerator be amended to let users choose between creating a CSV-file and a database-file?

Should be possible. Would need someone to make the efforts, though. Basically, the filtered *.csv are stored at least temporarily. So all that would be needed is another if-then-else to either import them to SQLite and delete them afterwards, or simply keep them (moving them from their temp location to a final one) – see the end of the flg script file.

Unwired labs offers two geolocation products/API's with different datasets.

I unfortunately don't know anything about that second set, sorry.

sobrus commented 1 year ago

If you only need to filter CSV file, you can easily do it using wget, cat and grep (just like flg does in data download step).

cat input.csv | egrep "^(UMTS)," > output_file.csv

(haven't tested it but it should be something like this)

Or, maybe oven better, just export the output sqlite database to CSV file:

https://www.sqlitetutorial.net/sqlite-export-csv/

Here you can also select fields and field order that is expected by LocalNLP Backend.

Nichtraucher commented 1 year ago

Well, I didn't read the annotations in the config file properly and noticed just now that the script can be set to keep the csv file in the tmp-folder. Sorry about that! :-/

However, importing these files into the app doesn't work because they're missing the required column titles for the parameters (radio, mcc, mnc etc.) . I've opened an issue about it.

I noticed that the script does insert them into the database-file, correct? Perhaps it can also put them into the csv-file?

IzzySoft commented 1 year ago

I noticed that the script does insert them into the database-file, correct?

In the database, the columns do exist (thanks to the CREATE TABLE), so you have just to have the INSERT statement set appropriately. As for the CSV files, you could simply add the proper line at top. As the CSV files are not intended to be kept, but just a temporary means of being fed to the database, that is intentionally not done here or it would break the INSERT – and once the INSERT is done, the CSV is to be removed anyway.

But yes, that could probably be changed. Going by the referenced issue:

echo "radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal" > ${OCI_FILE}.new
cat ${OCI_FILE} >> ${OCI_FILE}.new

and you'd have a valid CSV for your purpose. Put that before the rm at the end of the script and move the resulting file where you want it to be (or replace ${OCI_FILE}.new accordingly). Though if I understand you correctly, you then wouldn't need the database file at all – which means the "proper implementation" would be to make that optional, too.

Nichtraucher commented 1 year ago

Though if I understand you correctly, you then wouldn't need the database file at all – which means the "proper implementation" would be to make that optional, too.

Correct.

In the meantime, the backend developer amended the backend to accept CSV-files without the headers. I'm not sure if this issue needs to be pursued?

IzzySoft commented 1 year ago

I don't know – I'm just a minor contributor here. If there is need/demand for it, why not implement it? It wouldn't be a huge task (I just don't have the time for it now). @sobrus needs to say if it's accepted.