alternate metadata input procedure

Exedge commented 8 years ago

I have a large collection of doujins from a huge siterip. So many that I have no idea where to start! the problem is that the metadata is stored in each folder as a series of separate extensionless files. each of these files is really just a text file with a string and a line.

here is an example of what is in each file

in the Characters file is

Aisha
Hongou Kazuto
Tōka

is there any way to add support for this?

the code would basically need to do something like this

for each folder

    get info from each file (simple read in)

    put into correct format info.json file (basically change the input method of the current)

    delete all blank type files.

here is the javascript currently used to create the json file New Text Document.txt

and here is a snapshot of the basic folder structure, just reading in artist, title, characters, contents, and parody is really all it would need to do, the rest can trash.

capture

If it is possible, can someone make some kind of extension or extra program for this?

twiddli commented 8 years ago

An edge case like this won't get supported. There is a way though. It involves making a small script (like you mention yourself). Currently, when importing galleries Happypanda supports extracting metadata from files named info.json with eze's structure which can be cut down to this (this feature will be expanded on sometime in the future):

{
  "gallery_info": {
    "title": "Hello Title",
    "title_original": "Hello Original Title",
    "category": "Manga",
    "tags": {
      "namespace": ["tag1"],
      "namespace2": ["tag1", "tag2"],
    },
    "language": "English",
    "translated": true,
  },
  "image_api_key": "",
  "image_info": []
}

The most important keys are gallery_info, image_api_key and image_info. They need to be present. image_api_key and image_info can be left empty because the only reason they are included is for identifying purposes.

So what I would do in your case is write a script that gathers the metadata across the files, puts it in a dict identical to the one above, then serializes it as info.json.

I'll be available for help in making the script if you need that (not to be confused with me making the script for you).

Exedge commented 8 years ago

I think i can write most of the code in java, if not i can try others. however i might need a little help getting it to work through multiple folders. I plan on making it a standalone little application that you can drag and drop into the folder. when you click on it it will go through all the subfolders and do the compiling/deleting.

also i really need to know how the database reads the metadata. specifically which parts are essential and which parts i can just leave blank when writing to the json file

Exedge commented 8 years ago

so far i have made a python program that can read in the data and output it into a json file. however the json file is only in a long string for now as a test run. i still need to get it into the right format and i need the deleteOld function to work. once these are all done i can work on making it run through multiple folders.

if anyone can help I could really use it, especially getting the deleteOld function to delete the files

here is the python code, just change the .txt to .py convertor.txt

i am using this to organize a torrent siterip of pururin. Useful since they seem to have shut down.

If i can get this to convert correctly i can make it so anyone can do the same, and the collection is around a hundred GBs, which is even larger than it sounds.

twiddli commented 8 years ago

In the makeJSON function, I recommend making a dict first and then add the necessary data to it like so:

metadata = {
"gallery_info": {}, # this is the key you put data in
"image_api_key":"", # these last two keys need to be present or else Happypanda won't accept this file (just leave them empty)
"image_info":""
""
}

gallery_info = metadata['gallery_info']

gallery_info['title'] = getTitle(path)
gallery_info['category'] = getCategory(path)
gallery_info['artist'] = getArtist(path)
gallery_info['language'] = getLanguage(path)

gallery_info['tags'] = {}
gallery_tags = gallery_info['tags']

gallery_tags['Characters'] = getCharacters(path) # should return a list
# contents = getContents(path)   what is this?
gallery_tags['Group'] = getGroup(path) # should return a list
gallery_tags['Parody'] = getParody(path) # should return a list

Python has json encoder/decoder in the standard library so after filling out the metadata, you just do:

import json

# in makeJSON function
# metadata = {} from earlier

with open("info.json", "w", encoding="utf-8") as infofile:
    json.dump(metadata, infofile)

The deleteOld function is fine.. You run it after making the json, I guess.. One thing that I recommend is replacing all those if statements with a for loop :

# after making the json
def deleteOld(path):
     files = ["__Artist__", "__Category__", "__Characters__", ...]
     for f in files:
         p = os.path.join(path, f)
         if os.path.isfile(p):
              os.remove(p)

Exedge commented 8 years ago

thank you for the help, i will see if i can get this to do at least work a single folder today. i might even manage to get this done today!

also, can you to post the python code that you use to populate from a directory? I can probably re-purpose it to work with this. I just need the part that iterates through the subfolders from the top.

Exedge commented 8 years ago

here is a test folder with the test program. it should do what it is supposed to but something is wrong

$1,000,000 no Best Order!.txt

change the extension to zip and then extract it

twiddli commented 8 years ago

I don't know how you're running the python file but you should do it via cmd. That way you'll see its output and whatnot. Here is the convertor.txt with fixed formatting.

I tried running it and it spewed some error saying some object wasn't serialize-able... I didn't fix it for you but getTitle and getCategory doesn't return strings.

Exedge commented 8 years ago

ok it looks like it does the stuff for the most part but the formatting of the json is a bit off and also is there anywhere that we can put the contents file into it? if that file can also be read in then the database can search for specific tags like non-h etc

here are the info that was generated as well as the contents file.

info.txt Contents.txt

once the formatting is fixed than i can just focus on making it iterate through folders. the end is in sight!

twiddli commented 8 years ago

You can put the data from the contents file in the tag field:

gallery_tags['Misc'] = getContents(path) # should return a list

You can format the output of the json file with the optional indent parameter:

json.dump(metadata, infofile, indent=4)

Exedge commented 8 years ago

here is the last stage of the single folder version. all that needs to be fixed is that it needs to always output the same order as the order that it takes thing in. any ideas? convertor.txt

Exedge commented 8 years ago

Ok now i have a program that can work with a single folder and and it converts it over to a readable json. it seems like the order of the other stuff doesn't matter but i have made it so that the gallery_info is all ordered. Now all that is left is to make it iterate through folders and make it do this in each one

here is the script so far

convertor.txt

also can i post a link to the torrent on here containing the whole archive that this works with?

Exedge commented 8 years ago

i think i have just about done it! i have run a few tests and things seem to be working smoothly. i'm going to try some larger scale stuff before i post it. it should be done within a few hours.

Exedge commented 8 years ago

Ok i have tested it and it seems to run smoothly! I believe i can now call this the pururin convertor 1.0.

here is what you need to do in order to convert your files:

step 1: put the convertor in the root folder of the files (should contain only the folders that contain the individual chapters, use extract here when you extract the zips)

step 2: change the extension from .txt to .py

step 3: run it on the command line -type cd -copy the directory of the program and paste after that, hit enter -type py pururinConvertor.py and hit enter

step 4: make a sandwich, it should take a while

step 5: finally, add them to the database

this last step can take a very VERY VEEEERY long time if you do a large quantity.

and thats all!

here is the file:

pururinConvertor.txt

twiddli / happypanda

alternate metadata input procedure #68