qian256 / qian256.github.io

My personal blog hosted by Github. Theme forked from poole/hyde.
http://longqian.me
Other
293 stars 73 forks source link

Tags generated, count 0 #2

Open andy5995 opened 6 years ago

andy5995 commented 6 years ago

Tags generated, count 0

It seems to be breaking before reading the tags.

After adding a little debug code

$ python3 tag_generator.py 
_posts/2017-10-29-robert_koch_institut.md
_posts/2017-10-29-कोगनीटिव_बिहेवियर_थरेपी.md
_posts/2017-10-27-a-canvas-of-the-minds.md
_posts/2017-10-28-side_effects_book_alison_bass.md
Tags generated, count 0

It never gets to here:

for tag in total_tags:
    tag_filename = tag_dir + tag + '.md'
    f = open(tag_filename, 'a')
+  print(tag_filename)

Maybe the format of my post file?

tags:
    - wordpress
    - personal_stories
    - collaborative
    - blogs
qian256 commented 6 years ago

Did you still have the problem? That may be because there was not a folder called tag when the script is called. I fixed it in the new script.

linotes commented 6 years ago

我把 _post 中的文件分别放到新建的子目录下以后就无法生成标签了,可否麻烦更新一下?谢谢!

leucotic commented 5 years ago

Hi, I am using jekyll to build my site and I have a similar problem, and in fact after running the script, it ended up deleting my existing manually-created tagname.md pages. I think the issue possibly has to do with the fact that it's looking through

post_dir = '_posts/'

however I want it to generate tags that are in other pages, in other places, i don't know if I could do it with site.pages or something? I do not know much about python. There may also be a different issue with it, when I tried moving some of the posts/pages from which I want to generate tags into the _posts/ directory, I got this error:

 File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1882: ordinal not in range(128)

I don't think I have anything particularly weird in my YAML, only numbers, letters, :, ], /, -,

or does it go through the entire post? I'm not sure how to deal with this, or if my analysis is completely wrong. Any advice would be appreciated!

AleksandrHovhannisyan commented 4 years ago

For me, the reason this was happening is because my _pages/ directory organizes blog posts into subfolders based on the primary category they fall into:

image

The script as it is written will only work if all of your blog posts are dumped in the _pages directory. If you want to also traverse all nested subdirectories and process those blog posts, use this:

for dir_name, subdir_list, file_list in os.walk(post_dir):
    for file in file_list:
        f = open(os.path.join(dir_name, file), 'r', encoding='utf-8')
        crawl = False
       # rest of the script
5nizza commented 3 years ago

to add to the @AleksandrHovhannisyan comment, here is the code that supports subdirectories as well as the list of tags specified with tags (so you can write things like tags: [one, two, 'first tag', 'second tag']):

import glob
import os

post_dir = '_posts/'
tag_dir = 'tag/'

file_names = glob.glob(post_dir + '**/*.md', recursive=True)

tags = set()
for file in file_names:
    f = open(file, 'r')
    inside_header = False
    for line in f:
        line = line.strip()
        if line == '---':
            if inside_header:
                break  # continue to the next file
            inside_header = True
        if line.startswith('tags:'):
            tags_token = line[5:].strip()
            if tags_token.startswith('['):
                tags_token = tags_token.strip('[]')
                new_tags = [l.strip().strip(" "+"'"+'"')
                            for l in tags_token.split(',')]
            else:
                new_tags = tags_token.split()
            tags.update(new_tags)
    f.close()

old_tags = glob.glob(tag_dir + '*.md')
for tag in old_tags:
    os.remove(tag)

if not os.path.exists(tag_dir):
    os.makedirs(tag_dir)

for tag in tags:
    tag_filename = tag_dir + tag + '.md'
    f = open(tag_filename, 'a')
    write_str = '---\nlayout: tagpage\ntitle: \"Tag: ' + tag + '\"\ntag: ' + tag + '\nrobots: noindex\n---\n'
    f.write(write_str)
    f.close()

print("Tags generated ({count}): {tags}".format(count=len(tags),
                                                tags=', '.join(tags)))