tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
548 stars 165 forks source link

Extract Path Contexts Only #110

Closed Avv22 closed 2 years ago

Avv22 commented 2 years ago

Hello,

Given that both your models code2seq and code2vec are initially made to predict method name from source code body represented as path context, can you please give give how to extract path context as we are just looking for source code representation of the source code.

Thank you.

urialon commented 2 years ago

Hi @Avra2 ,

What exactly do you mean by "extract path context"? Do you want the paths? (it's in the raw dataset) These paths' representations? (do you really want ~200 vectors for every example?) The aggregation of these 200 vectors? (this is the "code vector")

Uri

On Sun, Dec 5, 2021 at 3:44 PM Avra @.***> wrote:

Hello,

Given that both your models code2seq and code2vec are initially made to predict method name from source code body represented as path context, can you please give give how to extract path context as we are just looking for source code representation of the source code.

Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2seq/issues/110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMCTQ5I57YS253EB6T3UPPFJ5ANCNFSM5JNF3EAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lyriccoder commented 2 years ago

Seems @Avra2 wants to get paths. Try to compile Java application inside the project or just find the compiled jar file.

You can run it with java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.2-SNAPSHOT.jar JavaExtractor.App --max_path_length=8 --max_path_width=2 --file filename, where filename is a file with a java function.

Suppose, you have the code:

public boolean f(Set<String> set, String value) 
{  
   for (String entry : set) 
   {  
       if (entry.equalsIgnoreCase(value)) 
       {  
           return true ;   
        }  
   }  
   return false;  
}

So, the code will be translated into the following list of paths:

set GenericClass1|Prm|Mth|Bk|Foreach|VDE|VD|VDID0 entry
METHOD_NAME Nm1|Mth|Prm|GenericClass1 set
set GenericClass1|Prm|Mth|Bk|Foreach|VDE|Cls0 string
set VDID0|Prm|GenericClass|Cls0 string
string Cls0|GenericClass|Prm|Mth|Bk|Foreach|VDE|VD|VDID0 entry
string Cls0|GenericClass|Prm|Mth|Prm|Cls1 string
set GenericClass|Cls0 string
set GenericClass1|Prm|Mth|Bk|Foreach|Nm1 set
string Cls0|GenericClass|Prm|Mth|Bk|Foreach|Nm1 set
boolean Prim0|Mth|Prm|GenericClass|Cls0 string
METHOD_NAME Nm1|Mth|Prm|GenericClass|Cls0 string
set GenericClass1|Prm|Mth|Prm|VDID0 value
set GenericClass1|Prm|Mth|Bk|Ret|BoolEx0 false
set GenericClass1|Prm|Mth|Bk|Foreach|Bk|If|Cal0|Nm3 equals|ignore|case
boolean Prim0|Mth|Prm|GenericClass1 set
string Cls0|GenericClass|Prm|Mth|Prm|VDID0 value
set GenericClass1|Prm|Mth|Bk|Foreach|Bk|If|Cal0|Nm2 value
set GenericClass1|Prm|Mth|Bk|Foreach|Bk|If|Cal0|Nm0 entry
set VDID0|Prm|GenericClass1 set
string Cls0|GenericClass|Prm|Mth|Bk|Ret|BoolEx0 false
string Cls0|GenericClass|Prm|Mth|Bk|Foreach|VDE|Cls0 string
set GenericClass1|Prm|Mth|Prm|Cls1 string

Did u want those lists of paths (you called it "path context")?

Avv22 commented 2 years ago

@lyriccoder @urialon. Thanks for response. By path context, I don't refer to paths extracted by parser but aggregated path learned by your model by the help of attention mechanism as this should be "the most that contributes to method name" please? So can we use this "vector that the most that contributes to method name" as representation embedding for the whole file or this is just useful for your task, which is method name prediction? If this vector can be used for various tasks, then can you please show how to extract it from your network during training?

Thanks.

urialon commented 2 years ago

Did you try this? https://github.com/tech-srl/code2seq#step-4-manual-examination-of-a-trained-model

On Wed, Dec 8, 2021 at 2:53 PM Avra @.***> wrote:

@lyriccoder https://github.com/lyriccoder @urialon https://github.com/urialon. Thanks for response. By path context, I don't refer to paths extracted by parser but aggregated path learned by your model by the help of attention mechanism as this should be the most that contributes to method name please? So can we use this vector that the most that contributes to method name please as representation embeddin for the whole file or this is just useful for your task, which is method name prediction? If this vector can be used for various tasks, then can you please show how to extract it from your network during training?

Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2seq/issues/110#issuecomment-989146542, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMBOGGT33YQQJTXNRP3UP6ZS5ANCNFSM5JNF3EAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Avv22 commented 2 years ago

@urialon. Thank you.