Sorry to bother you. I currently try to train a new CRP classifier with text input, and I would like to check whether it is possible to use part of your training samples. Can I ask whether there is any HTML for each of the 9010 samples used for your CRP classifier?
If possible, I also would like to check how the 9010 samples are taken from the Phishpedia dataset. I tried to use domain name, such as 12tv here, to match each sample page to a page in the original sets phish_sample_30k and benign_sample_30k. However, it seems there is no exact domain match for most of the CRP samples.
Among those sample pages that have a domain match in the original Phishpedia dataset, the screenshots between the CRP sample and the original sample are different. An example is with the domain name 360converter. Its screenshot in the 9010 set indicates that the sample is not a CRP.
However, its screenshot in the original set benign_sample_30k shows that the sample is a CRP.
Can I ask how to match the 9010 samples to the original samples in the Phishpedia evaluation sets in this case? I look forward to receiving your reply.
Sorry to bother you. I currently try to train a new CRP classifier with text input, and I would like to check whether it is possible to use part of your training samples. Can I ask whether there is any HTML for each of the 9010 samples used for your CRP classifier?
If possible, I also would like to check how the 9010 samples are taken from the Phishpedia dataset. I tried to use domain name, such as
12tv
here, to match each sample page to a page in the original sets phish_sample_30k and benign_sample_30k. However, it seems there is no exact domain match for most of the CRP samples.Among those sample pages that have a domain match in the original Phishpedia dataset, the screenshots between the CRP sample and the original sample are different. An example is with the domain name
360converter
. Its screenshot in the 9010 set indicates that the sample is not a CRP.However, its screenshot in the original set benign_sample_30k shows that the sample is a CRP.
Can I ask how to match the 9010 samples to the original samples in the Phishpedia evaluation sets in this case? I look forward to receiving your reply.