I want to fine tune the model with data from a single domain, e.g. my-blog.com/article1, my-blog.com/article2 etc. to produce text that has the style of my-blog.com
1) Can I use your training_utils/training.py with the "Links" control code? If so, how should I modify the make_tf_records.py to have all different URLs in the training dataset?
2) What must be the format of the input text_file? Should I just append the contents from all URLs into the same txt file?
This is fine; you can also use a separate token like Blog and then having the URL as the first part of every new article.
Yeah; concatenating is fine. Prepend the URL as above. What I'd recommend though is to create one TF record per file; the training script will pick up all TFRecords in the active folder.
I want to fine tune the model with data from a single domain, e.g. my-blog.com/article1, my-blog.com/article2 etc. to produce text that has the style of my-blog.com
1) Can I use your training_utils/training.py with the "Links" control code? If so, how should I modify the make_tf_records.py to have all different URLs in the training dataset?
2) What must be the format of the input text_file? Should I just append the contents from all URLs into the same txt file?