leedrake5 / Russia-Ukraine

Equipment Loss Tracking
MIT License
617 stars 26 forks source link

number before the system #28

Open cpytaa opened 8 months ago

cpytaa commented 8 months ago

Hi, apologies for an additional inquiry.

In this sheet (https://github.com/leedrake5/Russia-Ukraine/blob/main/data/bySystem/Totals/Full/2023-11-13.csv), there seemed to be some numbers before the system name. Are these numbers supposed to align with the total column in the right? If so, there seemed to be inconsistencies based on this version.

I am thinking if it might be easier to leave the system name in without the counts ahead of their name so it is easier to be merged with the classes.csv sheet.

Just a thought.

Thank you.

-Teresa

leedrake5 commented 8 months ago

This issue has been the bane of my existence with regard to scraping from Oryx's page, if you know a convenient way to scrub please let me know!

cpytaa commented 8 months ago

Oh I can remove the numbers if they have a trailing blanks before the weapon name with a string function (subinstr) in Stata. If you like I can upload here with a file and you can review and see if that is okay.

cpytaa commented 8 months ago

system_review_20231115.xlsx Thoughts? Based upon matched items in the daily files, those matched were in the 2nd sheet whereas ones who I cannot find an actual match were in the 1st sheet. Let me know your thoughts.

leedrake5 commented 8 months ago

Oh I can remove the numbers if they have a trailing blanks before the weapon name with a string function (subinstr) in Stata. If you like I can upload here with a file and you can review and see if that is okay.

Looks good - can you paste through stata function? I can recode in R.

cpytaa commented 8 months ago

Yeah. So the number depends on the "counts of weapons" of what has been written in the text. If one day there is another number say "3009 tanks" then can just change the "2000" to "3009". For example. Hope that this makes sense.

image

There seemed to be some problem with putting in the commands here code_for_leading_number.txt as some of the quotation marks or apostrophes just disappear. I am attaching the code in a separate file.

valerinaryshkin commented 6 months ago

I used Python script to fix this

`import re import glob

for i in glob.glob("*.csv"): read = open(i, 'r', encoding="utf8") reader = read.read() csvRe = re.sub(r'\,+\"+\d+\s', ',"', str(reader)) write = open(i, 'w', encoding="utf8") write.write(csvRe) read.close() write.close()`