ligoudaner377 / font_translator_gan

88 stars 9 forks source link

Some problems encountered with font_file! #17

Open Chyamble opened 11 months ago

Chyamble commented 11 months ago

I want to make a large Japanese data set for training, but I encountered some problems while making it.

1、I got a lot of free ttf and otf files from two websites and successfully made them into png format images in windows.(https://www.freejapanesefont.com/ and https://fontmeme.com/ziti/gonta-kana-font/ ) When I put these successfully produced data sets into the Ubuntu system for training, I found that the names of these images were garbled. What is happening? How should I handle this?

2、So I took the collected ttf or otf files to the ubuntu system to make a data set, but it failed. Here are some tips for failure. (ZenOldMincho-Medium.ttf failed!!!!!!!!!!!!!!!!!! ZenOldMincho-Regular.ttf failed!!!!!!!!!!!!!!!!!! ZenOldMincho-SemiBold.ttf failed!!!!!!!!!!!!!!!!!! ZeroGothic.otf failed!!!!!!!!!!!!!!!!!! 偊傞杰乕.ttf failed!!!!!!!!!!!!!!!!!! 偊傞杰乕P.ttf failed!!!!!!!!!!!!!!!!!! 偼lateral偧唔僼only忞僩.ttf failed!!!!!!!!!!!!!!!!!! 剉偔偔杰偆偋勫偲.ttf failed!!!!!!!!!!!!!!!!!! 剉偔偔杰偆偋偲嵶.ttf failed!!!!!!!!!!!!!!!!!! 剉偔偔杰偆偋勁偲暍3.ttf failed!!!!!!!!!!!!!!!!!! 卒偐狠姧嫫-P.otf failed!!!!!!!!!!!!!!!!!! PS.otf failed!!!!!!!!!!!!!!!!!! 卒偐狠姧嫫.otf failed!!!!!!!!!!!!!!!!!! 备偢 jun媞娤 N [M].ttf failed!!!!!!!!!!!!!!!!!! )

How can I effectively solve these problems?Can the project be trained on Windows?

ligoudaner377 commented 11 months ago

hi @Chyamble

  1. Have you checked the encoding during the zip and unzip? I would recommend you check it before having a look at problem 2, as you already got the correct PNG files in Windows right?

you can try to train on the Windows system but this code is not well-tested on it

ligoudaner377 commented 11 months ago

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

Chyamble commented 11 months ago

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

ligoudaner377 commented 11 months ago

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

The problem must come from here, make sure both zip and unzip encodings are utf-8

Chyamble commented 11 months ago

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

The problem must come from here, make sure both zip and unzip encodings are utf-8

I tried using other compression software to compress the correct png image into .zip, and then moved it to ubuntu for decompression, but the name still appeared garbled. (note there is no error in the image, just the name) How should I handle this situation? How should I ensure that both zip and unzip encodings are utf-8?

Chyamble commented 11 months ago

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

The system language I use is not a Japanese computer, and I am a Chinese student.

ligoudaner377 commented 11 months ago

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

The problem must come from here, make sure both zip and unzip encodings are utf-8

I tried using other compression software to compress the correct png image into .zip, and then moved it to ubuntu for decompression, but the name still appeared garbled. (note there is no error in the image, just the name) How should I handle this situation? How should I ensure that both zip and unzip encodings are utf-8?

I already forgot how I solved the problem as it is almost 3 years ago (apologize for that) But the problem is obviously an encoding and decoding problem, so I think it is not that hard to solve. Maybe try this post and see if it can slove your problem? https://superuser.com/questions/60379/how-can-i-create-a-zip-tgz-in-linux-such-that-windows-has-proper-filenames

Chyamble commented 11 months ago

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

The problem must come from here, make sure both zip and unzip encodings are utf-8

I tried using other compression software to compress the correct png image into .zip, and then moved it to ubuntu for decompression, but the name still appeared garbled. (note there is no error in the image, just the name) How should I handle this situation? How should I ensure that both zip and unzip encodings are utf-8?

I already forgot how I solved the problem as it is almost 3 years ago (apologize for that) But the problem is obviously an encoding and decoding problem, so I think it is not that hard to solve. Maybe try this post and see if it can slove your problem? https://superuser.com/questions/60379/how-can-i-create-a-zip-tgz-in-linux-such-that-windows-has-proper-filenames

By the way, if you are using a computer with the system language in Japanese, this kind of problem always happens. because the default character encoding is JIS (probably, not 100% sure) I encountered this problem before.

你好@Chyamble

  1. 您在压缩和解压缩过程中检查过编码吗? 我建议您在查看问题 2 之前先检查一下,因为您已经在 Windows 中获得了正确的 PNG 文件,对吧?

您可以尝试在 Windows 系统上进行训练,但此代码尚未经过充分测试

Yes, I got the correct png file in windows and the name of the image is correct. But after decompressing Ubuntu, the name of the image is garbled. I directly used 360 compression software to directly compress the image and then moved it to the Ubuntu system for decompression. Are there other error conditions that could arise from this?

The problem must come from here, make sure both zip and unzip encodings are utf-8

I tried using other compression software to compress the correct png image into .zip, and then moved it to ubuntu for decompression, but the name still appeared garbled. (note there is no error in the image, just the name) How should I handle this situation? How should I ensure that both zip and unzip encodings are utf-8?

I already forgot how I solved the problem as it is almost 3 years ago (apologize for that) But the problem is obviously an encoding and decoding problem, so I think it is not that hard to solve. Maybe try this post and see if it can slove your problem? https://superuser.com/questions/60379/how-can-i-create-a-zip-tgz-in-linux-such-that-windows-has-proper-filenames

Thank you very much for your prompt answer. I will try the solutions on the website you posted first, which may solve these problems. (If it still doesn’t work, I might bother you again. I’m very sorry)

Chyamble commented 11 months ago

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

Chyamble commented 11 months ago

“chinese2english”

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

It should be "chinese2english", not "english2chinese"

ligoudaner377 commented 11 months ago

Which mode are you referring to? In my experience, unseen style is much more difficult than unseen content.

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

Chyamble commented 11 months ago

您指的是哪种模式? 根据我的经验,看不见的风格比看不见的内容要困难得多。

顺便说一句,我还有一个问题。我在你的论文(跨语言字体样式迁移)中看到你使用知识迁移来缓解模型过度拟合的问题。除了这个操作可以缓解过拟合之外,项目中还有哪些方法可以缓解过拟合。因为当我进行“english2chinese”实验时,结果确实不理想。

“chinese2english”

Chyamble commented 11 months ago

Which mode are you referring to? In my experience, unseen style is much more difficult than unseen content.

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

  1. I still want to know what methods the author has adopted in the project to prevent overfitting (other than knowledge transfer). If you have time, you can reply. Thank you very much.

  2. I used the Japanese fonts compiled yesterday as an unknown language as a test set (its style and content are unknown), and successfully generated images, but the generated results cannot be used to test indicators (the generated images are not like The ordering is regular like "english2chinese", that is, not by generated img, gt_img, generated img, gt_img.... but in large chunks, which makes it impossible to perform indicator testing). How should I handle this situation?

ligoudaner377 commented 11 months ago

您指的是哪种模式? 根据我的经验,看不见的风格比看不见的内容要困难得多。

顺便说一句,我还有一个问题。我在你的论文(跨语言字体样式迁移)中看到你使用知识迁移来缓解模型过度拟合的问题。除了这个操作可以缓解过拟合之外,项目中还有哪些方法可以缓解过拟合。因为当我进行“english2chinese”实验时,结果确实不理想。

“chinese2english”

if I remember correctly, this repo doesn't support "chinese2english". and to achieve that, an additional data partition is required (See Figure 6 in Cross-language font transfer) Could you please tell me how you implement this? cause the implementation may affect the final results a lot

ligoudaner377 commented 11 months ago

Which mode are you referring to? In my experience, unseen style is much more difficult than unseen content.

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

  1. I still want to know what methods the author has adopted in the project to prevent overfitting (other than knowledge transfer). If you have time, you can reply. Thank you very much.
  2. I used the Japanese fonts compiled yesterday as an unknown language as a test set (its style and content are unknown), and successfully generated images, but the generated results cannot be used to test indicators (the generated images are not like The ordering is regular like "english2chinese", that is, not by generated img, gt_img, generated img, gt_img.... but in large chunks, which makes it impossible to perform indicator testing). How should I handle this situation?
  1. I briefly checked the paper, and I think knowledge transfer (Section 4.6) is the only method that we used to prevent overfitting.
  2. If I understand you correctly, you already generated some images but can not evaluate them because the data structure is not the same as the results of "english2chinese"? If so, I think you can write a script to convert the data structure so that it can be exactly the same as "english2chinese".
Chyamble commented 11 months ago

Which mode are you referring to? In my experience, unseen style is much more difficult than unseen content.

By the way, I still have a question. I saw in your paper (Cross-language font style transfer) that you use knowledge transfer to alleviate the problem of model overfitting. In addition to this operation can alleviate overfitting, what other methods are used in the project to alleviate overfitting. Because when I conducted the "english2chinese" experiment, the results were indeed not ideal.

  1. I still want to know what methods the author has adopted in the project to prevent overfitting (other than knowledge transfer). If you have time, you can reply. Thank you very much.
  2. I used the Japanese fonts compiled yesterday as an unknown language as a test set (its style and content are unknown), and successfully generated images, but the generated results cannot be used to test indicators (the generated images are not like The ordering is regular like "english2chinese", that is, not by generated img, gt_img, generated img, gt_img.... but in large chunks, which makes it impossible to perform indicator testing). How should I handle this situation?
  1. I briefly checked the paper, and I think knowledge transfer (Section 4.6) is the only method that we used to prevent overfitting.
  2. If I understand you correctly, you already generated some images but can not evaluate them because the data structure is not the same as the results of "english2chinese"? If so, I think you can write a script to convert the data structure so that it can be exactly the same as "english2chinese".
  1. Yes, that’s what I did. I implemented "chinese2english" by reclassifying the dataset as described in the paper, but it produced very bad results. (So I thought about what other methods could be used to prevent it from overfitting. But unfortunately, you said that he can only use knowledge transfer to prevent overfitting.) 2.I don't understand the logic behind this, so it's a bit difficult for me.
ligoudaner377 commented 11 months ago

can you give me one or two examples or an illustration to describe your current problem? I'm not sure if I understand your problem correctly or not

Chyamble commented 11 months ago

I don’t know if my description makes you understand, but I try my best.... For example, the generated should be "xyd_ぇ_generated_img,xyd_ぇ_gt_img,xyd_う_generated_img,xyd_う_gt_img,xyd_い _generated_img,xyd_い_gt_img.....", but in fact he is "xyd_ぇ_generated_img,xyd_う_generated_img,xyd_い_generated_img...." I guess it's because The order of generation is out of order, making it impossible to evaluate. I don't know how to handle it.

can you give me one or two examples or an illustration to describe your current problem? I'm not sure if I understand your problem correctly or not

Chyamble commented 11 months ago

In addition, I also conducted an experiment on L1 this afternoon. The data of this experiment is too far from the data in the paper. (w/o L1 of Table 3 "english2chinese" in the Cross-language font style transfer paper)Why is there such a big gap?

                                                        unseen characters

(FTransGAN)w/o L1 0.205 (MAE) 0.282(ssim) 0.206 (MSSIM) 87.0 (ACC(c)) 93.6(mFID(c)) 8.7 (ACC(s)) 639.7(mFID(s)) (my result) w/o L1 0.282(MAE) 0.1476 0.020 28.4 430.521 1.67 850.85

                                                                     unseen style

(FTransGAN) w/o L1 0.235 0.212 0.173 87.9 209.3 2.5 516.4 (my result) w/o L1 0.300 0.0974 0.00434 44.14 521.3288 2.4 747.4437

ligoudaner377 commented 11 months ago

In addition, I also conducted an experiment on L1 this afternoon. The data of this experiment is too far from the data in the paper. (w/o L1 of Table 3 "english2chinese" in the Cross-language font style transfer paper)Why is there such a big gap?

                                                        unseen characters

(FTransGAN)w/o L1 0.205 (MAE) 0.282(ssim) 0.206 (MSSIM) 87.0 (ACC(c)) 93.6(mFID(c)) 8.7 (ACC(s)) 639.7(mFID(s)) (my result) w/o L1 0.282(MAE) 0.1476 0.020 28.4 430.521 1.67 850.85

                                                                     unseen style

(FTransGAN) w/o L1 0.235 0.212 0.173 87.9 209.3 2.5 516.4 (my result) w/o L1 0.300 0.0974 0.00434 44.14 521.3288 2.4 747.4437

as I mentioned before, the partition rule of the dataset has already changed, which might be the reason.

can you leave an e-mail address here? I can send the code and dataset of "Cross-language font style transfer" to you. But I'm doing an internship currently, and these files are stored on the server of my university. I can only access them after I come back (after 12/1). let me know if you have urgent deadlines recently or not.

ligoudaner377 commented 11 months ago

1."xyd_ぇ_generated_img,xyd_ぇ_gt_img,xyd_う_generated_img,xyd_う_gt_img,xyd_い _generated_img,xyd_い_gt_img....."

  1. "xyd_ぇ_generated_img,xyd_う_generated_img,xyd_い_generated_img...." From your description, the only difference between 1 and 2 is with/without GT images? am I correct?
Chyamble commented 11 months ago

1."xyd_ぇ_generated_img,xyd_ぇ_gt_img,xyd_う_generated_img,xyd_う_gt_img,xyd_い _generated_img,xyd_い_gt_img....." 2. "xyd_ぇ_generated_img,xyd_う_generated_img,xyd_い_generated_img...." From your description, the only difference between 1 and 2 is with/without GT images? am I correct?

1."xyd_ぇ_generated_img,xyd_ぇ_gt_img,xyd_う_generated_img,xyd_う_gt_img,xyd_い _generated_img,xyd_い_gt_img....."

  1. "xyd_ぇ_generated_img,xyd_う_generated_img,xyd_い_generated_img,xyd_ぇ_gt_img,xyd_う_gt_img,xyd_い_gt_img,....“ They all have GT images and generated images. Just the ordering is different. I guess this is the reason why they cannot be evaluated, but I don’t know the specific reason...