Missing bucket information in table file

wanedoo commented 2 years ago

Hi Guilhem!

First of all thank you for your efforts regarding this Addon! Im not a developer myself but this TA-addon has come really in-handy. However, I can see that sometimes when archiving script fails (returning errocode=1, looks like you are developing a bit more "debugging outputs"). I have verify that Splunk later on successfully archiving the bucket however the table in the blobstorage do not get updated. So I have a miss match in number of archived buckets and rows in the table. And as the archived file do not include the complete bucket name example junk_07152862-20B6-483A-A08B-E09D99031E07_386.tgz Im not able to identify the correct tgz for the bucket archived.

we run version 1.0.4

Are you able to include the hole bucketname? Or do you have any tips to identify in what archive the bucket could exist? best regards Alex

guilhemmarchand commented 2 years ago

Hi @wanedoo !

Thanks, you are welcome ;-)

So, first, I have some fresh news as a new release is about to be published (version 1.1) which is going to be much more powerful with builtin features to better interract with Splunk / Azure, access the AZ table directly from Splunk etc

https://github.com/guilhemmarchand/TA-azure-blob-archiving/releases/tag/1.1.0

The missing part currently is the documentation, but I would publish a week or so.

Coming back on the issue you raised:

So I have a miss match in number of archived buckets and rows in the table. And as the archived file do not include the complete bucket name example
junk_07152862-20B6-483A-A08B-E09D99031E07_386.tgz
Im not able to identify the correct tgz for the bucket archived.

This sounds suspicious, the process only adds a record when the bucket was successfully archived, if the bucket was not successfully archived then Splunk would automatically re-attempt to archive it during a next execution. We shouldn't reach this type of behaviour.

Can you locate a sample of the Addon logs when this happens and share this please? The name of the index, and other information should be clearly visible in it

wanedoo commented 2 years ago

Hi @guilhemmarchand ! Thank you for your reply!

I do not find any "output" logs from Azure2Blob.py except Splunkd bucketmover and the error messages. I attach some screenshots showing that Splunk is successfully archiving all buckets that initially have an error.

Screenshot 2022-02-18 at 09 42 24

Screenshot 2022-02-18 at 09 50 21

Screenshot 2022-02-18 at 09 51 02

Screenshot 2022-02-18 at 10 06 18 And as all .tgz file is named without the complete bucket name, it's a bit tricky to identify what archive the bucket is in. Can the file name of the archive include the hole bucket name?

Image 2022-02-18 at 09 58

Viasplunk.list represent an output of the tablefile downloaded from azure blobs.list represent an output listing of files in the blob. And as you can see there is a big difference.

Best regards // Alex

guilhemmarchand / TA-azure-blob-archiving

Missing bucket information in table file #5