tagbase / tagbase-server

tagbase-server is a data management web service for working with eTUFF and nc-eTAG files.
https://oiip.jpl.nasa.gov/doc/OIIP_Deliverable7.4_TagbasePostgreSQLeTUFF_UserGuide.pdf
Apache License 2.0
7 stars 2 forks source link

fswatch needs to account for empty files created by sftp clients #211

Closed lewismc closed 1 year ago

lewismc commented 1 year ago

Upon copying data from the local machine to staging_data ingestion works fine.

cp /path/to/file/159903_2012_117464_eTUFF.txt staging_data

tagbase_server_1  | 2023-04-04 05:59:57,076 - INFO - /tmp/159903_2012_117464_eTUFF.txt
tagbase_server_1  | 2023-04-04 05:59:57,077 - INFO - etuff ingestion queue: ['/tmp/159903_2012_117464_eTUFF.txt']
tagbase_server_1  | 2023-04-04 05:59:57,087 - INFO - Processing etuff file: /tmp/159903_2012_117464_eTUFF.txt
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{
fswatch_1         |   "code": "200",
fswatch_1         |   "elapsed": "0.01",
fswatch_1         |   "message": "Asynchronously ingesting 1 file(s) into Tagbase DB."
fswatch_1         | }
100   98k  100   109  100   98k   6411  5772k --:--:-- --:--:-- --:--:-- 5779k
tagbase_server_1  | 2023-04-04 05:59:57,106 - INFO - Successful INSERT of '/tmp/159903_2012_117464_eTUFF.txt' into 'submission' table.
tagbase_server_1  | 2023-04-04 05:59:57,149 - INFO - Built raw 'proc_observations' data structure from 2181 observations in: 0.04 second(s)
tagbase_server_1  | 2023-04-04 05:59:57,162 - INFO - Built Pandas DF from 2181 records. Time elapsed: 0.01 second(s)
tagbase_server_1  | 2023-04-04 05:59:57,187 - INFO - Copied Pandas DF to StringIO memory buffer. Time elapsed: 0.02 second(s)
tagbase_server_1  | 2023-04-04 05:59:57,187 - INFO - Copying memory buffer to 'proc_observations' and executing 'data_migration' TRIGGER.
tagbase_server_1  | 2023-04-04 05:59:57,276 - INFO - Successful migration of 2181 'proc_observations'. Elapsed time: 0.09 second(s).
tagbase_server_1  | 2023-04-04 05:59:57,282 - INFO - Data file /tmp/159903_2012_117464_eTUFF.txt successfully ingested into Tagbase DB. Total time: 0.19 second(s)

However when we attempt to ingest the same file by using SFTP client it appears that fswatch is creating an event when the file is created but when no data is actually written to the file yet. We need to define if this is exactly what is happening. See below for the bug...

put 159903_2012_117464_eTUFF.txt 159903_2012_117464_eTUFF.txt

postgis_1         | 2023-04-04 06:00:18.071 UTC [224] LOG:  checkpoint starting: time
fswatch_1         | Contents of /usr/src/app/staging_data/ changed; Processing: /usr/src/app/staging_data/159903_2012_117464_eTUFF.txt
fswatch_1         |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
fswatch_1         |                                  Dload  Upload   Total   Spent    Left  Speed
tagbase_server_1  | this operation accepts multiple content types, using application/octet-stream
tagbase_server_1  | this operation accepts multiple content types, using application/octet-stream
tagbase_server_1  | 2023-04-04 06:00:31,227 - INFO - /tmp/159903_2012_117464_eTUFF.txt
tagbase_server_1  | 2023-04-04 06:00:31,228 - INFO - etuff ingestion queue: ['/tmp/159903_2012_117464_eTUFF.txt']
tagbase_server_1  | 2023-04-04 06:00:31,236 - INFO - Processing etuff file: /tmp/159903_2012_117464_eTUFF.txt
100   109  100   109    0     0 {    0      0 --:--:-- --:--:-- --:--:--     0
fswatch_1         |   "code": "200",
fswatch_1         |   "elapsed": "0.01",
fswatch_1         |   "message": "Asynchronously ingesting 1 file(s) into Tagbase DB."
fswatch_1         | }
fswatch_1         |   7785      0 --:--:-- --:--:-- --:--:--  7785
tagbase_server_1  | 2023-04-04 06:00:31,255 - INFO - Successful INSERT of '/tmp/159903_2012_117464_eTUFF.txt' into 'submission' table.
postgis_1         | 2023-04-04 06:00:31.256 UTC [1897] tagbase@tagbase ERROR:  syntax error at or near ")" at character 82
postgis_1         | 2023-04-04 06:00:31.256 UTC [1897] tagbase@tagbase STATEMENT:  SELECT attribute_id, attribute_name FROM metadata_types WHERE attribute_name IN ()