Open presidentofyes12 opened 1 year ago
json.loads() take a json string and converts it into dictionary. By json string we mean json object but enclosed in quotes such as "{"name":"John", "age":30, "car":null}"
. Here you can't access any value using dictionary method since its wrapped in quotes so it acts like string. Thats why you convert it into dictionary by using json.loads(STRING_NAME)
The text you are loading does not contain data in json form.
I had the idea to make a dataset out of IBM's Project Codenet codes (a little less than 14 million in total). I converted them all into text files (ending in .txt rather than .py, .c, etc) and after a couple of prior issues with the encoding that I solved by removing about 1 million files that had incorrect encodings (reducing it to about 12 million files), I tried to run it again. It then gave another, different error:
This has had me stuck for a while. It had previously made such an error beforehand, and I decided that I'd move away certain folders that had the issue to be fixed later, but there were 4053 overall folders and I couldn't get move all problematic folders. I also looked through my dataset to make sure there were no shell files, JSON files, or CSV files that could cause such an issue, but there were none- it's just .txt, the cat.pdf file, and a Word doc. Without the dataset, I've been able to run the program with little difficulty, and even certain (if not most) folders were suitable as datasets to use.
Why is this error happening and how could I fix it? I am using Ubuntu 22.04, and Python 3.11.4. I've placed the program files onto an external hard drive, where the program has been able to run. If the error is unable to be fixed, are there any ways to circumvent the error?